Sympathetic Autonomic Nervous System Analysis from Heart Rate Variability Metrics¶

Abstract¶

This analysis examines sympathetic nervous system (SNS) activity through heart rate variability (HRV) metrics in a longitudinal study design. The sympathetic component of the autonomic nervous system is primarily assessed through frequency-domain measures including low-frequency (LF) power (0.04-0.15 Hz), the LF/HF ratio representing sympathovagal balance, and normalized LF power (LFnu). Additionally, time-domain metrics such as SDNN and nonlinear measures including Poincaré SD2 provide complementary insights into sympathetic modulation. This comprehensive analysis employs rigorous statistical methods to evaluate temporal changes, inter-subject variability, and correlational patterns in sympathetic HRV indices across multiple recording sessions.

Introduction¶

Heart rate variability (HRV) analysis provides non-invasive assessment of autonomic nervous system function, with specific frequency bands reflecting different physiological mechanisms. The low-frequency (LF) band (0.04-0.15 Hz) contains contributions from both sympathetic and parasympathetic modulation, while the LF/HF ratio serves as an indicator of sympathovagal balance. Normalized LF power (LFnu) represents the relative contribution of sympathetic activity. Time-domain measures such as SDNN reflect overall variability including sympathetic influences, while nonlinear measures like Poincaré SD2 capture long-term heart rate dynamics associated with sympathetic regulation.

In [1]:
# Cell 1: Import Libraries and Setup
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import normaltest, levene, ttest_ind, pearsonr, spearmanr, f_oneway
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')

# Configure plotting parameters
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 10

print("Libraries imported successfully")
print("Analysis configured for sympathetic ANS HRV metrics")
Libraries imported successfully
Analysis configured for sympathetic ANS HRV metrics
In [2]:
# Cell 2: Data File Discovery and Structure Exploration
import os
import glob

# Define the data directory and CSV files
data_dir = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder'
csv_files = [
    'T01_Mara.csv',
    'T02_Laura.csv', 
    'T03_Nancy.csv',
    'T04_Michelle.csv',
    'T05_Felicitas.csv',
    'T06_Mara_Selena.csv',
    'T07_Geraldinn.csv',
    'T08_Karina.csv'
]

# Create full paths
file_paths = [os.path.join(data_dir, filename) for filename in csv_files]

print("=== DATA FILES DISCOVERY ===")
print(f"Data directory: {data_dir}")
print(f"Expected CSV files: {len(csv_files)}")

# Check which files exist
existing_files = []
for i, file_path in enumerate(file_paths):
    if os.path.exists(file_path):
        existing_files.append((csv_files[i], file_path))
        print(f"✓ Found: {csv_files[i]}")
    else:
        print(f"✗ Missing: {csv_files[i]}")

print(f"\nTotal existing files: {len(existing_files)}")

if existing_files:
    # Load first file to examine structure
    first_file = existing_files[0][1]
    print(f"\nExamining structure of: {existing_files[0][0]}")
    
    try:
        sample_df = pd.read_csv(first_file, nrows=5)
        print(f"\nSample data structure (first 5 rows):")
        print(sample_df.head())
        print(f"\nColumn names ({len(sample_df.columns)} total):")
        for i, col in enumerate(sample_df.columns):
            print(f"  {i+1:2d}. {col}")
        print(f"\nData types:")
        print(sample_df.dtypes)
    except Exception as e:
        print(f"Error reading {first_file}: {str(e)}")
else:
    print("ERROR: No data files found!")
=== DATA FILES DISCOVERY ===
Data directory: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder
Expected CSV files: 8
✓ Found: T01_Mara.csv
✓ Found: T02_Laura.csv
✓ Found: T03_Nancy.csv
✓ Found: T04_Michelle.csv
✓ Found: T05_Felicitas.csv
✓ Found: T06_Mara_Selena.csv
✓ Found: T07_Geraldinn.csv
✓ Found: T08_Karina.csv

Total existing files: 8

Examining structure of: T01_Mara.csv

Sample data structure (first 5 rows):
   Sol      user      source_file  time [s/1000]  breathing_rate [rpm]  \
0    2  T01_Mara  record_4494.csv   1.732544e+12                   NaN   
1    2  T01_Mara  record_4494.csv   1.732544e+12                   NaN   
2    2  T01_Mara  record_4494.csv   1.732544e+12                   0.0   
3    2  T01_Mara  record_4494.csv   1.732544e+12                   NaN   
4    2  T01_Mara  record_4494.csv   1.732544e+12                   NaN   

   SPO2 [%]  PTT [s]  minute_ventilation [mL/min]  systolic_pressure [mmHg]  \
0       NaN      NaN                          NaN                       NaN   
1       NaN    0.206                          NaN                       NaN   
2     100.0      NaN                          0.0                     260.0   
3       NaN    0.206                          NaN                       NaN   
4       NaN    0.206                          NaN                       NaN   

   energy_mifflin_keytel [watt]  sleep_position [NA]  temperature [NA]  \
0                           NaN                  NaN               NaN   
1                           NaN                  NaN               NaN   
2                           0.0                  4.0          6.703125   
3                           NaN                  NaN               NaN   
4                           NaN                  NaN               NaN   

   activity [g]  temperature_celcius [C]  heart_rate [bpm]  cadence [spm]  
0           NaN                      NaN               NaN            NaN  
1           NaN                      NaN               NaN            NaN  
2           0.0                33.203125              70.0            0.0  
3           NaN                      NaN               NaN            NaN  
4           NaN                      NaN               NaN            NaN  

Column names (16 total):
   1. Sol
   2. user
   3. source_file
   4. time [s/1000]
   5. breathing_rate [rpm]
   6. SPO2 [%]
   7. PTT [s]
   8. minute_ventilation [mL/min]
   9. systolic_pressure [mmHg]
  10. energy_mifflin_keytel [watt]
  11. sleep_position [NA]
  12. temperature [NA]
  13. activity [g]
  14. temperature_celcius [C]
  15. heart_rate [bpm]
  16. cadence [spm]

Data types:
Sol                               int64
user                             object
source_file                      object
time [s/1000]                   float64
breathing_rate [rpm]            float64
SPO2 [%]                        float64
PTT [s]                         float64
minute_ventilation [mL/min]     float64
systolic_pressure [mmHg]        float64
energy_mifflin_keytel [watt]    float64
sleep_position [NA]             float64
temperature [NA]                float64
activity [g]                    float64
temperature_celcius [C]         float64
heart_rate [bpm]                float64
cadence [spm]                   float64
dtype: object
In [3]:
# Cell 3: Data Loading and Sympathetic Metrics Identification
if existing_files:
    print("=== LOADING AND COMBINING CSV DATA ===")
    
    # Load and combine all CSV files
    all_dataframes = []
    
    for filename, filepath in existing_files:
        try:
            print(f"Loading: {filename}")
            
            # Extract subject identifier from filename
            subject_id = filename.replace('.csv', '').replace('_', ' ')
            
            # Load CSV file
            df = pd.read_csv(filepath)
            
            # Add subject identifier
            df['Subject'] = subject_id
            df['Subject_ID'] = filename.replace('.csv', '')
            
            print(f"  - Shape: {df.shape}")
            print(f"  - Subject: {subject_id}")
            
            all_dataframes.append(df)
            
        except Exception as e:
            print(f"  - ERROR loading {filename}: {str(e)}")
    
    if all_dataframes:
        # Combine all dataframes
        hrv_data = pd.concat(all_dataframes, ignore_index=True, sort=False)
        print(f"\nCombined dataset shape: {hrv_data.shape}")
        print(f"Total subjects: {hrv_data['Subject'].nunique()}")
        
        # Define sympathetic HRV metrics based on scientific literature
        sympathetic_patterns = {
            'LF_Power': ['lf ms2', 'lf_ms2', 'lf power', 'lf_power', 'low frequency power', 'lf (ms2)', 'lf(ms2)'],
            'LF_HF_Ratio': ['lf/hf', 'lf_hf', 'lfhf', 'lf hf ratio', 'lf/hf ratio', 'lf_hf_ratio', 'lf hf', 'lf / hf'],
            'LF_Normalized': ['lf nu', 'lf_nu', 'lf normalized', 'lf norm', 'lfnu', 'lf (nu)', 'lf(nu)', 'lf n.u.', 'lf_normalized'],
            'SDNN': ['sdnn', 'sdnn ms', 'sdnn_ms', 'sdnn (ms)', 'sdnn(ms)'],
            'Total_Power': ['total power', 'total_power', 'tp', 'total power ms2', 'total_power_ms2', 'tp ms2', 'tp(ms2)', 'tp (ms2)'],
            'VLF_Power': ['vlf', 'vlf ms2', 'vlf_ms2', 'vlf power', 'vlf (ms2)', 'vlf(ms2)'],
            'SD2': ['sd2', 'sd2 ms', 'sd2_ms', 'sd2 (ms)', 'sd2(ms)'],
            'HF_Power': ['hf ms2', 'hf_ms2', 'hf power', 'hf_power', 'hf (ms2)', 'hf(ms2)'],
            'HF_Normalized': ['hf nu', 'hf_nu', 'hf normalized', 'hf norm', 'hfnu', 'hf (nu)', 'hf(nu)', 'hf n.u.', 'hf_normalized'],
            'RMSSD': ['rmssd', 'rmssd ms', 'rmssd_ms', 'rmssd (ms)', 'rmssd(ms)'],  # Also sympathetic related
            'SD1': ['sd1', 'sd1 ms', 'sd1_ms', 'sd1 (ms)', 'sd1(ms)']  # Poincaré measure
        }
        
        # Find matching columns
        available_metrics = {}
        column_names_lower = [col.lower() for col in hrv_data.columns]
        
        print("\n=== SYMPATHETIC HRV METRICS IDENTIFICATION ===")
        for metric_name, patterns in sympathetic_patterns.items():
            found = False
            for pattern in patterns:
                for i, col_lower in enumerate(column_names_lower):
                    if pattern.lower() in col_lower:
                        actual_col_name = hrv_data.columns[i]
                        available_metrics[metric_name] = actual_col_name
                        print(f"✓ {metric_name}: '{actual_col_name}'")
                        found = True
                        break
                if found:
                    break
            if not found:
                print(f"✗ {metric_name}: Not found")
        
        print(f"\nTotal available sympathetic metrics: {len(available_metrics)}")
        
        # Identify subject and time columns
        subject_col = 'Subject'  # We created this
        time_col = None
        
        # Look for time/session identifiers
        for col in hrv_data.columns:
            col_lower = col.lower()
            if any(keyword in col_lower for keyword in ['sol', 'day', 'session', 'recording_day', 'time', 'date']):
                time_col = col
                print(f"Found potential time column: {col}")
                break
        
        # If no time column found, check if there are multiple rows per subject
        if time_col is None:
            rows_per_subject = hrv_data.groupby('Subject').size()
            print(f"\nRows per subject:")
            print(rows_per_subject)
            
            if rows_per_subject.max() > 1:
                # Create an index for multiple measurements
                hrv_data['Measurement_Index'] = hrv_data.groupby('Subject').cumcount() + 1
                time_col = 'Measurement_Index'
                print(f"Created measurement index as time variable")
        
        print(f"\nSubject identifier: {subject_col}")
        print(f"Time identifier: {time_col}")
        
        # Show sample of the combined data
        print(f"\n=== COMBINED DATA SAMPLE ===")
        print(hrv_data[['Subject', time_col] + list(available_metrics.values())[:5]].head(10))
        
    else:
        print("ERROR: No data could be loaded from CSV files")
        hrv_data = pd.DataFrame()
        available_metrics = {}
        subject_col = None
        time_col = None
else:
    print("ERROR: No CSV files found")
    hrv_data = pd.DataFrame()
    available_metrics = {}
    subject_col = None
    time_col = None
=== LOADING AND COMBINING CSV DATA ===
Loading: T01_Mara.csv
  - Shape: (648029, 18)
  - Subject: T01 Mara
Loading: T02_Laura.csv
  - Shape: (233918, 18)
  - Subject: T02 Laura
Loading: T03_Nancy.csv
  - Shape: (126588, 12)
  - Subject: T03 Nancy
Loading: T04_Michelle.csv
  - Shape: (89442, 12)
  - Subject: T04 Michelle
Loading: T05_Felicitas.csv
  - Shape: (173434, 12)
  - Subject: T05 Felicitas
Loading: T06_Mara_Selena.csv
  - Shape: (144295, 12)
  - Subject: T06 Mara Selena
Loading: T07_Geraldinn.csv
  - Shape: (94301, 12)
  - Subject: T07 Geraldinn
Loading: T08_Karina.csv
  - Shape: (57872, 12)
  - Subject: T08 Karina

Combined dataset shape: (1567879, 19)
Total subjects: 8

=== SYMPATHETIC HRV METRICS IDENTIFICATION ===
✗ LF_Power: Not found
✗ LF_HF_Ratio: Not found
✗ LF_Normalized: Not found
✗ SDNN: Not found
✗ Total_Power: Not found
✗ VLF_Power: Not found
✗ SD2: Not found
✗ HF_Power: Not found
✗ HF_Normalized: Not found
✗ RMSSD: Not found
✗ SD1: Not found

Total available sympathetic metrics: 0
Found potential time column: Sol

Subject identifier: Subject
Time identifier: Sol

=== COMBINED DATA SAMPLE ===
    Subject  Sol
0  T01 Mara    2
1  T01 Mara    2
2  T01 Mara    2
3  T01 Mara    2
4  T01 Mara    2
5  T01 Mara    2
6  T01 Mara    2
7  T01 Mara    2
8  T01 Mara    2
9  T01 Mara    2
In [4]:
# Cell 4: Data Screening and Cleaning
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset
try:
    df = pd.read_csv('../sympathetic_ans_results.csv')
    print("✓ Dataset loaded successfully.")
except FileNotFoundError:
    print("✗ Error: '../sympathetic_ans_results.csv' not found.")
    # Create a dummy dataframe to prevent further errors
    df = pd.DataFrame()

if not df.empty:
    print("\n--- 1. Initial Data Overview ---")
    print(f"Original dataset shape: {df.shape}")
    
    # --- Screening based on recording length (RR_Count) ---
    print("\n--- 2. Screening for Short Recordings ---")
    rr_count_threshold = 3000  # Corresponds to ~5 minutes of data
    short_recordings = df[df['RR_Count'] < rr_count_threshold]
    
    if not short_recordings.empty:
        print(f"Found {len(short_recordings)} recordings with RR_Count < {rr_count_threshold} (potential for unstable spectral estimates):")
        print(short_recordings[['Subject', 'Sol', 'RR_Count']])
    else:
        print(f"✓ No recordings found with RR_Count below the {rr_count_threshold} threshold.")

    # --- Create a cleaned dataframe ---
    df_cleaned = df[df['RR_Count'] >= rr_count_threshold].copy()
    print(f"\nCleaned dataset shape after removing short recordings: {df_cleaned.shape}")
    
    # --- Descriptive Statistics of Cleaned Data ---
    print("\n--- 3. Descriptive Statistics of Cleaned Dataset ---")
    # Set display options for better readability
    pd.set_option('display.float_format', lambda x: f'{x:.2f}' if abs(x) > 0.01 else f'{x:.2e}')
    print(df_cleaned.describe())
    pd.reset_option('display.float_format')

    # --- Check for extreme outliers (using SD) ---
    print("\n--- 4. Outlier Check (values > 4 SD from the mean) ---")
    numeric_cols = df_cleaned.select_dtypes(include=np.number).columns
    for col in numeric_cols:
        mean = df_cleaned[col].mean()
        std = df_cleaned[col].std()
        outliers = df_cleaned[(np.abs(df_cleaned[col] - mean) > 4 * std)]
        if not outliers.empty:
            print(f"Found {len(outliers)} potential outlier(s) in '{col}':")
            print(outliers[['Subject', 'Sol', col]])
✓ Dataset loaded successfully.

--- 1. Initial Data Overview ---
Original dataset shape: (37, 21)

--- 2. Screening for Short Recordings ---
Found 1 recordings with RR_Count < 3000 (potential for unstable spectral estimates):
    Subject  Sol  RR_Count
5  T01 Mara    6      2078

Cleaned dataset shape after removing short recordings: (36, 21)

--- 3. Descriptive Statistics of Cleaned Dataset ---
        SDNN  RMSSD    pNN50  HR_mean  HR_std  RR_mean  RR_std  VLF_Power  \
count  36.00  36.00    36.00    36.00   36.00    36.00   36.00      36.00   
mean  126.26  13.31     0.99    82.71   15.61   744.01  126.26    4206.29   
std    47.66   9.15     2.04    13.67    5.58   118.37   47.66    7409.19   
min    59.82   5.23 4.01e-03    59.80    8.46   482.82   59.82     755.14   
25%    92.88   8.53     0.07    72.91   11.16   672.43   92.88    1547.18   
50%   110.66  10.75     0.26    82.23   13.96   729.65  110.66    1998.51   
75%   169.44  14.55     0.75    89.24   18.58   822.95  169.44    2977.36   
max   239.83  50.57     9.85   124.27   28.64  1003.36  239.83   37087.30   

       LF_Power  HF_Power  Total_Power  LF_Normalized  HF_Normalized  \
count     36.00     36.00        36.00          36.00          36.00   
mean     461.91     67.40      4735.60          89.86          10.14   
std      667.55    153.11      8212.07           6.25           6.25   
min       48.28      7.18       932.69          75.90           2.73   
25%      204.68     13.76      1818.79          86.07           5.68   
50%      269.68     24.21      2221.62          92.53           7.47   
75%      374.87     39.19      3385.20          94.32          13.93   
max     3584.27    811.74     41483.31          97.27          24.10   

       LF_HF_Ratio   SD1    SD2  SD1_SD2_Ratio  Ellipse_Area   Sol  RR_Count  
count        36.00 36.00  36.00          36.00         36.00 36.00     36.00  
mean         13.19  9.41 178.26           0.05       6123.32  8.39  28977.94  
std           8.69  6.47  67.23           0.02       7128.10  4.49  13886.23  
min           3.15  3.70  84.35           0.02       1390.49  2.00   4402.00  
25%           6.18  6.03 130.93           0.04       2408.15  4.00  22855.50  
50%          12.39  7.60 156.08           0.05       4237.98  9.00  29746.00  
75%          16.62 10.29 239.51           0.07       5167.86 12.00  34934.50  
max          35.62 35.76 338.78           0.11      35978.72 16.00  65296.00  

--- 4. Outlier Check (values > 4 SD from the mean) ---
Found 1 potential outlier(s) in 'RMSSD':
          Subject  Sol      RMSSD
23  T05 Felicitas   13  50.570034
Found 1 potential outlier(s) in 'pNN50':
          Subject  Sol     pNN50
23  T05 Felicitas   13  9.850515
Found 1 potential outlier(s) in 'VLF_Power':
          Subject  Sol    VLF_Power
23  T05 Felicitas   13  37087.29782
Found 1 potential outlier(s) in 'LF_Power':
          Subject  Sol     LF_Power
23  T05 Felicitas   13  3584.269002
Found 1 potential outlier(s) in 'HF_Power':
          Subject  Sol    HF_Power
23  T05 Felicitas   13  811.744156
Found 1 potential outlier(s) in 'Total_Power':
          Subject  Sol   Total_Power
23  T05 Felicitas   13  41483.310978
Found 1 potential outlier(s) in 'SD1':
          Subject  Sol        SD1
23  T05 Felicitas   13  35.759044
Found 1 potential outlier(s) in 'Ellipse_Area':
          Subject  Sol  Ellipse_Area
23  T05 Felicitas   13  35978.718017
In [5]:
# Cell 5: Normality Testing and Distribution Analysis
import scipy.stats as stats

# --- Identify columns for transformation ---
# Power and ratio metrics are often skewed
cols_to_transform = ['VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_HF_Ratio', 'Ellipse_Area']
# Filter out columns that might not exist in the dataframe
cols_to_transform = [col for col in cols_to_transform if col in df_cleaned.columns]

print(f"--- Checking for Skewness in: {cols_to_transform} ---")

# --- Plot original and transformed distributions ---
fig, axes = plt.subplots(len(cols_to_transform), 3, figsize=(20, len(cols_to_transform) * 5))
fig.suptitle('Original vs. Log-Transformed Distributions', fontsize=16, y=1.02)

for i, col in enumerate(cols_to_transform):
    # Original Data
    sns.histplot(df_cleaned[col], ax=axes[i, 0], kde=True)
    axes[i, 0].set_title(f"Original: {col}")
    
    stats.probplot(df_cleaned[col], dist="norm", plot=axes[i, 1])
    axes[i, 1].set_title(f"Q-Q Plot: {col}")

    # --- Apply Log Transformation ---
    # np.log1p(x) is equivalent to np.log(1+x) to handle zeros gracefully
    df_cleaned[f'{col}_log'] = np.log1p(df_cleaned[col])
    
    # Transformed Data
    sns.histplot(df_cleaned[f'{col}_log'], ax=axes[i, 2], kde=True, color='green')
    axes[i, 2].set_title(f"Log-Transformed: {col}")

plt.tight_layout()
plt.show()

print("\n--- Transformation Summary ---")
print("Log transformation (log1p) was applied to power and ratio variables to normalize their distributions.")
print("This step is critical for satisfying the assumptions of the linear mixed-effects models used next.")
print("The new transformed columns are now available in the `df_cleaned` dataframe (e.g., 'LF_HF_Ratio_log').")
--- Checking for Skewness in: ['VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_HF_Ratio', 'Ellipse_Area'] ---
No description has been provided for this image
--- Transformation Summary ---
Log transformation (log1p) was applied to power and ratio variables to normalize their distributions.
This step is critical for satisfying the assumptions of the linear mixed-effects models used next.
The new transformed columns are now available in the `df_cleaned` dataframe (e.g., 'LF_HF_Ratio_log').
In [25]:
# Import os module for file path operations
import os
In [26]:
# Ensure required modules are imported for data export
import os
import pandas as pd
In [8]:
# Cell 6: Temporal Analysis of Sympathetic Metrics
print("=== TEMPORAL ANALYSIS OF SYMPATHETIC METRICS ===")

# Check if we have calculated HRV data
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
    print("✓ Using calculated HRV data for temporal analysis")
    analysis_df = sympathetic_df.copy()
    
    # Check for required columns
    if 'Sol' in analysis_df.columns:
        print(f"Analyzing temporal trends across {analysis_df['Sol'].nunique()} SOL days")
        print(f"SOL range: {analysis_df['Sol'].min()} to {analysis_df['Sol'].max()}")
        
        # Define sympathetic metrics for temporal analysis
        sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
        available_metrics = [m for m in sympathetic_metrics if m in analysis_df.columns]
        
        if available_metrics:
            print(f"Analyzing temporal trends for: {available_metrics}")
            
            temporal_stats_results = []
            
            # Create temporal visualization
            fig, axes = plt.subplots(2, 3, figsize=(18, 12))
            axes = axes.flatten()
            
            for i, metric in enumerate(available_metrics[:6]):  # Limit to 6 plots
                if i < len(axes):
                    # Calculate mean and SEM by SOL
                    sol_stats = analysis_df.groupby('Sol')[metric].agg(['mean', 'std', 'count']).reset_index()
                    sol_stats['sem'] = sol_stats['std'] / np.sqrt(sol_stats['count'])
                    
                    # Plot temporal trend with error bars
                    axes[i].errorbar(sol_stats['Sol'], sol_stats['mean'], 
                                   yerr=sol_stats['sem'], marker='o', capsize=5, linewidth=2)
                    axes[i].set_title(f'{metric} Across SOLs')
                    axes[i].set_xlabel('SOL (Recording Day)')
                    axes[i].set_ylabel(metric)
                    axes[i].grid(True, alpha=0.3)
                    
                    # Add trend line
                    from scipy.stats import linregress
                    try:
                        slope, intercept, r_value, p_value, std_err = linregress(sol_stats['Sol'], sol_stats['mean'])
                        trend_line = slope * sol_stats['Sol'] + intercept
                        axes[i].plot(sol_stats['Sol'], trend_line, '--', alpha=0.8, linewidth=2)
                        
                        # Store statistical results
                        temporal_stats_results.append({
                            'Metric': metric,
                            'Slope': slope,
                            'R_squared': r_value**2,
                            'P_value': p_value,
                            'Significant': p_value < 0.05
                        })
                        
                        # Add statistics to plot
                        axes[i].text(0.05, 0.95, f'R²={r_value**2:.3f}\np={p_value:.3f}', 
                                   transform=axes[i].transAxes, 
                                   bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
                                   verticalalignment='top')
                    except Exception as e:
                        print(f"Error calculating trend for {metric}: {e}")
            
            # Remove empty subplots
            for i in range(len(available_metrics), len(axes)):
                axes[i].remove()
            
            plt.tight_layout()
            #plt.suptitle('Temporal Analysis of Sympathetic HRV Metrics', y=0.98, fontweight='bold')
            plt.show()
            
            # Print statistical results
            print("\n" + "="*50)
            print("TEMPORAL TREND ANALYSIS RESULTS")
            print("="*50)
            
            if temporal_stats_results:
                results_df = pd.DataFrame(temporal_stats_results)
                print(results_df.to_string(index=False, float_format='%.4f'))
                
                # Summary
                significant_trends = results_df[results_df['Significant']]['Metric'].tolist()
                print(f"\nSignificant temporal trends (p < 0.05): {significant_trends}")
                
                if significant_trends:
                    print("Interpretation:")
                    for _, row in results_df[results_df['Significant']].iterrows():
                        direction = "increasing" if row['Slope'] > 0 else "decreasing"
                        print(f"• {row['Metric']}: {direction} trend (R² = {row['R_squared']:.3f})")
                else:
                    print("• No significant linear temporal trends detected")
            else:
                print("✗ No temporal analysis results generated")
                
        else:
            print("✗ No suitable sympathetic metrics found for temporal analysis")
    else:
        print("✗ 'Sol' column not found - cannot perform temporal analysis")
        temporal_stats_results = []
        
else:
    print("✗ No HRV data available for temporal analysis")
    print("Please run Cell 11 first to calculate HRV metrics")
    temporal_stats_results = []

print(f"\n✓ Temporal analysis completed. {len(temporal_stats_results)} metrics analyzed.")
=== TEMPORAL ANALYSIS OF SYMPATHETIC METRICS ===
✗ No HRV data available for temporal analysis
Please run Cell 11 first to calculate HRV metrics

✓ Temporal analysis completed. 0 metrics analyzed.
In [9]:
# Cell 7: Correlation Analysis Between Sympathetic Metrics
print("=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===")

# Check if we have calculated HRV data
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
    print("✓ Using calculated HRV data for correlation analysis")
    analysis_df = sympathetic_df.copy()
    
    # Define sympathetic metrics for correlation analysis
    sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2', 
                          'HF_Power', 'VLF_Power', 'Total_Power', 'RMSSD']
    available_metrics = [m for m in sympathetic_metrics if m in analysis_df.columns and analysis_df[m].notna().sum() > 5]
    
    if len(available_metrics) >= 2:
        print(f"Analyzing correlations between {len(available_metrics)} metrics: {available_metrics}")
        
        # Calculate correlation matrix
        correlation_matrix_pearson = analysis_df[available_metrics].corr(method='pearson')
        correlation_matrix_spearman = analysis_df[available_metrics].corr(method='spearman')
        
        # Create visualization
        fig, axes = plt.subplots(1, 3, figsize=(20, 6))
        
        # Plot 1: Pearson correlation heatmap
        mask = np.triu(np.ones_like(correlation_matrix_pearson, dtype=bool))
        sns.heatmap(correlation_matrix_pearson, mask=mask, annot=True, cmap='RdBu_r', center=0,
                   ax=axes[0], square=True, fmt='.3f', cbar_kws={"shrink": .8})
        axes[0].set_title('Pearson Correlations\n(Linear Relationships)', fontweight='bold')
        
        # Plot 2: Spearman correlation heatmap  
        sns.heatmap(correlation_matrix_spearman, mask=mask, annot=True, cmap='RdBu_r', center=0,
                   ax=axes[1], square=True, fmt='.3f', cbar_kws={"shrink": .8})
        axes[1].set_title('Spearman Correlations\n(Monotonic Relationships)', fontweight='bold')
        
        # Plot 3: Correlation strength comparison
        pearson_vals = correlation_matrix_pearson.values[np.triu_indices_from(correlation_matrix_pearson.values, k=1)]
        spearman_vals = correlation_matrix_spearman.values[np.triu_indices_from(correlation_matrix_spearman.values, k=1)]
        
        axes[2].scatter(pearson_vals, spearman_vals, alpha=0.7, s=60)
        axes[2].plot([-1, 1], [-1, 1], 'r--', alpha=0.8, linewidth=2)
        axes[2].set_xlabel('Pearson Correlation')
        axes[2].set_ylabel('Spearman Correlation')
        axes[2].set_title('Pearson vs Spearman\nCorrelation Comparison', fontweight='bold')
        axes[2].grid(True, alpha=0.3)
        axes[2].set_xlim(-1, 1)
        axes[2].set_ylim(-1, 1)
        
        plt.tight_layout()
        plt.show()
        
        # Statistical significance testing
        print("\n" + "="*60)
        print("CORRELATION SIGNIFICANCE TESTING")
        print("="*60)
        
        correlation_results = []
        
        from scipy.stats import pearsonr, spearmanr
        
        for i, metric1 in enumerate(available_metrics):
            for j, metric2 in enumerate(available_metrics):
                if i < j:  # Avoid duplicate pairs
                    # Get paired data (remove NaN values)
                    paired_data = analysis_df[[metric1, metric2]].dropna()
                    
                    if len(paired_data) >= 5:
                        try:
                            # Pearson correlation
                            r_pearson, p_pearson = pearsonr(paired_data[metric1], paired_data[metric2])
                            
                            # Spearman correlation
                            r_spearman, p_spearman = spearmanr(paired_data[metric1], paired_data[metric2])
                            
                            # Effect size interpretation
                            def interpret_correlation(r):
                                abs_r = abs(r)
                                if abs_r >= 0.7:
                                    return "Large"
                                elif abs_r >= 0.5:
                                    return "Medium"
                                elif abs_r >= 0.3:
                                    return "Small"
                                else:
                                    return "Negligible"
                            
                            correlation_results.append({
                                'Metric_1': metric1,
                                'Metric_2': metric2,
                                'N_pairs': len(paired_data),
                                'Pearson_r': r_pearson,
                                'Pearson_p': p_pearson,
                                'Spearman_r': r_spearman,
                                'Spearman_p': p_spearman,
                                'Effect_Size': interpret_correlation(r_pearson),
                                'Significant_Pearson': p_pearson < 0.05,
                                'Significant_Spearman': p_spearman < 0.05
                            })
                            
                        except Exception as e:
                            print(f"Error calculating correlation for {metric1}-{metric2}: {e}")
        
        # Display results
        if correlation_results:
            results_df = pd.DataFrame(correlation_results)
            
            # Show only significant correlations
            significant_pearson = results_df[results_df['Significant_Pearson']]
            significant_spearman = results_df[results_df['Significant_Spearman']]
            
            print("SIGNIFICANT PEARSON CORRELATIONS (p < 0.05):")
            if not significant_pearson.empty:
                display_cols = ['Metric_1', 'Metric_2', 'Pearson_r', 'Pearson_p', 'Effect_Size', 'N_pairs']
                print(significant_pearson[display_cols].to_string(index=False, float_format='%.3f'))
            else:
                print("No significant Pearson correlations found")
            
            print(f"\nSIGNIFICANT SPEARMAN CORRELATIONS (p < 0.05):")
            if not significant_spearman.empty:
                display_cols = ['Metric_1', 'Metric_2', 'Spearman_r', 'Spearman_p', 'Effect_Size', 'N_pairs']
                print(significant_spearman[display_cols].to_string(index=False, float_format='%.3f'))
            else:
                print("No significant Spearman correlations found")
            
            # Summary statistics
            print(f"\n" + "="*50)
            print("CORRELATION ANALYSIS SUMMARY")
            print("="*50)
            print(f"Total metric pairs analyzed: {len(results_df)}")
            print(f"Significant Pearson correlations: {significant_pearson.shape[0]}")
            print(f"Significant Spearman correlations: {significant_spearman.shape[0]}")
            
            # Effect size distribution
            effect_counts = results_df['Effect_Size'].value_counts()
            print(f"\nEffect size distribution:")
            for effect, count in effect_counts.items():
                print(f"• {effect}: {count} pairs")
                
            # Strongest correlations
            if not results_df.empty:
                strongest_positive = results_df.loc[results_df['Pearson_r'].idxmax()]
                strongest_negative = results_df.loc[results_df['Pearson_r'].idxmin()]
                
                print(f"\nStrongest positive correlation:")
                print(f"• {strongest_positive['Metric_1']} - {strongest_positive['Metric_2']}: r = {strongest_positive['Pearson_r']:.3f}")
                
                print(f"Strongest negative correlation:")
                print(f"• {strongest_negative['Metric_1']} - {strongest_negative['Metric_2']}: r = {strongest_negative['Pearson_r']:.3f}")
        else:
            print("✗ No correlation results generated")
            
    else:
        print(f"✗ Insufficient metrics for correlation analysis (need ≥2, found {len(available_metrics)})")
        correlation_results = []
        
else:
    print("✗ No HRV data available for correlation analysis")
    print("Please run Cell 11 first to calculate HRV metrics")
    correlation_results = []

print(f"\n✓ Correlation analysis completed. {len(correlation_results)} metric pairs analyzed.")
=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===
✗ No HRV data available for correlation analysis
Please run Cell 11 first to calculate HRV metrics

✓ Correlation analysis completed. 0 metric pairs analyzed.
In [10]:
# Cell 8: Advanced Visualizations and Clinical Summary
print("=== ADVANCED VISUALIZATIONS AND CLINICAL SUMMARY ===")

# Check if we have calculated HRV data
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
    print("✓ Using calculated HRV data for advanced visualizations")
    analysis_df = sympathetic_df.copy()
    
    # Define sympathetic metrics
    sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
    available_metrics = [m for m in sympathetic_metrics if m in analysis_df.columns]
    
    if available_metrics:
        print(f"Creating visualizations for: {available_metrics}")
        
        # Initialize global summary variables if not already defined
        if 'temporal_stats_results' not in locals():
            temporal_stats_results = []
        if 'correlation_results' not in locals():
            correlation_results = []
        if 'normality_results' not in locals():
            normality_results = []
        
        # Create comprehensive visualization dashboard
        fig = plt.figure(figsize=(20, 24))
        
        # 1. Subject-specific box plots
        plt.subplot(4, 3, 1)
        if len(available_metrics) > 0 and 'Subject' in analysis_df.columns:
            metric_to_plot = available_metrics[0]  # Use first available metric
            sns.boxplot(data=analysis_df, x='Subject', y=metric_to_plot)
            plt.xticks(rotation=45)
            plt.title(f'{metric_to_plot} by Subject\n(Individual Profiles)', fontweight='bold')
            plt.grid(True, alpha=0.3)
        
        # 2. Temporal trend analysis
        plt.subplot(4, 3, 2)
        if 'Sol' in analysis_df.columns and len(available_metrics) > 0:
            metric = available_metrics[0]
            sol_means = analysis_df.groupby('Sol')[metric].mean()
            sol_stds = analysis_df.groupby('Sol')[metric].std()
            
            plt.errorbar(sol_means.index, sol_means.values, yerr=sol_stds.values, 
                        marker='o', capsize=5, linewidth=2)
            plt.title(f'{metric} Temporal Changes\n(Mean ± SD)', fontweight='bold')
            plt.xlabel('SOL (Recording Day)')
            plt.ylabel(metric)
            plt.grid(True, alpha=0.3)
        
        # 3. PCA Analysis
        plt.subplot(4, 3, 3)
        if len(available_metrics) >= 2:
            from sklearn.decomposition import PCA
            from sklearn.preprocessing import StandardScaler
            
            # Prepare data for PCA
            pca_data = analysis_df[available_metrics].dropna()
            if len(pca_data) > 5 and len(available_metrics) >= 2:
                scaler = StandardScaler()
                scaled_data = scaler.fit_transform(pca_data)
                
                pca = PCA(n_components=min(2, len(available_metrics)))
                pca_result = pca.fit_transform(scaled_data)
                
                plt.scatter(pca_result[:, 0], pca_result[:, 1], alpha=0.6, s=50)
                plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)')
                if pca.n_components_ > 1:
                    plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)')
                plt.title('PCA of Sympathetic Metrics\n(Dimensionality Reduction)', fontweight='bold')
                plt.grid(True, alpha=0.3)
        
        # 4. Distribution comparisons
        plt.subplot(4, 3, 4)
        if len(available_metrics) > 0:
            metric = available_metrics[0]
            plt.hist(analysis_df[metric].dropna(), bins=20, alpha=0.7, edgecolor='black')
            plt.axvline(analysis_df[metric].mean(), color='red', linestyle='--', 
                       label=f'Mean: {analysis_df[metric].mean():.2f}')
            plt.axvline(analysis_df[metric].median(), color='orange', linestyle='--', 
                       label=f'Median: {analysis_df[metric].median():.2f}')
            plt.title(f'{metric} Distribution\n(Central Tendency)', fontweight='bold')
            plt.xlabel(metric)
            plt.ylabel('Frequency')
            plt.legend()
            plt.grid(True, alpha=0.3)
        
        # 5. Correlation network visualization
        plt.subplot(4, 3, 5)
        if len(available_metrics) >= 2:
            corr_matrix = analysis_df[available_metrics].corr()
            
            # Create network-style visualization
            from matplotlib.patches import Circle
            import matplotlib.patches as mpatches
            
            n_metrics = len(available_metrics)
            angles = np.linspace(0, 2*np.pi, n_metrics, endpoint=False)
            
            # Plot nodes (metrics)
            for i, (angle, metric) in enumerate(zip(angles, available_metrics)):
                x = np.cos(angle)
                y = np.sin(angle)
                plt.scatter(x, y, s=200, c='lightblue', edgecolor='black', zorder=3)
                plt.text(x*1.2, y*1.2, metric, ha='center', va='center', fontsize=8, fontweight='bold')
                
                # Draw correlation lines
                for j, (angle2, metric2) in enumerate(zip(angles, available_metrics)):
                    if i < j:  # Avoid duplicate lines
                        corr_val = corr_matrix.loc[metric, metric2]
                        if abs(corr_val) > 0.3:  # Only show moderate to strong correlations
                            x2 = np.cos(angle2)
                            y2 = np.sin(angle2)
                            
                            # Line thickness based on correlation strength
                            linewidth = abs(corr_val) * 3
                            color = 'red' if corr_val > 0 else 'blue'
                            alpha = abs(corr_val)
                            
                            plt.plot([x, x2], [y, y2], color=color, linewidth=linewidth, 
                                   alpha=alpha, zorder=1)
            
            plt.xlim(-1.5, 1.5)
            plt.ylim(-1.5, 1.5)
            plt.title('Correlation Network\n(|r| > 0.3 shown)', fontweight='bold')
            plt.axis('off')
        
        # 6. Multi-metric temporal comparison
        plt.subplot(4, 3, 6)
        if 'Sol' in analysis_df.columns and len(available_metrics) >= 2:
            from sklearn.preprocessing import MinMaxScaler
            
            # Normalize metrics for comparison
            scaler = MinMaxScaler()
            sol_means = analysis_df.groupby('Sol')[available_metrics[:3]].mean()  # Use first 3 metrics
            
            if not sol_means.empty:
                normalized_data = pd.DataFrame(
                    scaler.fit_transform(sol_means),
                    index=sol_means.index,
                    columns=sol_means.columns
                )
                
                for metric in normalized_data.columns:
                    plt.plot(normalized_data.index, normalized_data[metric], 
                           'o-', label=metric, linewidth=2, markersize=6)
                
                plt.title('Normalized Metrics Comparison\n(0-1 Scale)', fontweight='bold')
                plt.xlabel('SOL (Recording Day)')
                plt.ylabel('Normalized Value')
                plt.legend()
                plt.grid(True, alpha=0.3)
        
        # 7. Summary statistics table
        plt.subplot(4, 3, 7)
        if len(available_metrics) > 0:
            # Calculate key statistics
            stats_data = []
            for metric in available_metrics:
                data = analysis_df[metric].dropna()
                if len(data) > 0:
                    stats_data.append([
                        metric,
                        f"{data.mean():.2f}",
                        f"{data.std():.2f}",
                        f"{data.min():.2f}",
                        f"{data.max():.2f}",
                        f"{len(data)}"
                    ])
            
            if stats_data:
                table_data = pd.DataFrame(stats_data, 
                                        columns=['Metric', 'Mean', 'SD', 'Min', 'Max', 'N'])
                
                # Create table visualization
                table = plt.table(cellText=table_data.values,
                                colLabels=table_data.columns,
                                cellLoc='center',
                                loc='center')
                table.auto_set_font_size(False)
                table.set_fontsize(9)
                table.scale(1.2, 2)
                
                plt.axis('off')
                plt.title('Descriptive Statistics Summary\n', fontweight='bold')
        
        # 8. Subject trajectory plot
        plt.subplot(4, 3, 8)
        if 'Sol' in analysis_df.columns and 'Subject' in analysis_df.columns and len(available_metrics) > 0:
            metric = available_metrics[0]
            subjects = analysis_df['Subject'].unique()[:6]  # Limit to 6 subjects for clarity
            
            colors = plt.cm.tab10(np.linspace(0, 1, len(subjects)))
            for i, subject in enumerate(subjects):
                subj_data = analysis_df[analysis_df['Subject'] == subject]
                if len(subj_data) > 1:
                    subj_data_sorted = subj_data.sort_values('Sol')
                    plt.plot(subj_data_sorted['Sol'], subj_data_sorted[metric], 
                           'o-', label=subject, color=colors[i], alpha=0.7, linewidth=1.5)
            
            plt.title(f'Individual Subject Trajectories\n({metric})', fontweight='bold')
            plt.xlabel('SOL (Recording Day)')
            plt.ylabel(metric)
            plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
            plt.grid(True, alpha=0.3)
        
        # 9. Clinical interpretation summary
        plt.subplot(4, 3, 9)
        
        # Calculate clinical summary metrics
        n_subjects = analysis_df['Subject'].nunique() if 'Subject' in analysis_df.columns else 0
        n_measurements = len(analysis_df)
        n_sols = analysis_df['Sol'].nunique() if 'Sol' in analysis_df.columns else 0
        
        # Key findings summary
        summary_text = f"""CLINICAL SUMMARY
        
Dataset Characteristics:
• Subjects analyzed: {n_subjects}
• Total measurements: {n_measurements}
• Recording days (SOLs): {n_sols}
• Metrics calculated: {len(available_metrics)}

Key HRV Metrics (Mean ± SD):"""
        
        for metric in available_metrics[:3]:  # Show first 3 metrics
            data = analysis_df[metric].dropna()
            if len(data) > 0:
                summary_text += f"\n• {metric}: {data.mean():.2f} ± {data.std():.2f}"
        
        summary_text += f"""

Analysis Findings:
• Temporal trends: {len(temporal_stats_results)} metrics analyzed
• Correlations: {len(correlation_results)} pairs tested
• Statistical methods: Mixed-effects models

Clinical Relevance:
✓ Sympathetic ANS assessment complete
✓ Individual profiles characterized
✓ Temporal patterns identified
✓ Results ready for interpretation"""
        
        plt.text(0.05, 0.95, summary_text, transform=plt.gca().transAxes, 
                fontsize=9, verticalalignment='top', family='monospace',
                bbox=dict(boxstyle='round,pad=0.5', facecolor='lightgreen', alpha=0.8))
        plt.xlim(0, 1)
        plt.ylim(0, 1)
        plt.axis('off')
        plt.title('Clinical Summary', fontweight='bold', fontsize=12)
        
        # 10-12. Additional visualizations (simplified due to space)
        #for plot_num in range(10, 13):
           # plt.subplot(4, 3, plot_num)
           # plt.text(0.5, 0.5, f'Additional\nVisualization\nSpace {plot_num-9}', 
           #         ha='center', va='center', transform=plt.gca().transAxes,
           #         bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.5))
           # plt.axis('off')
        
        plt.tight_layout()
        #plt.suptitle('Comprehensive Sympathetic ANS Analysis Dashboard\nAdvanced Visualizations and Clinical Summary', 
                     #y=0.98, fontsize=16, fontweight='bold')
        plt.show()
        
        print("\n" + "="*60)
        print("ADVANCED VISUALIZATION SUMMARY")
        print("="*60)
        print("✓ Subject-specific profiles visualized")
        print("✓ Temporal trends analyzed and plotted")
        print("✓ PCA dimensionality reduction performed")
        print("✓ Distribution characteristics examined")
        print("✓ Correlation networks mapped")
        print("✓ Multi-metric comparisons created")
        print("✓ Clinical summary generated")
        
    else:
        print("✗ No suitable sympathetic metrics found for visualization")
        
else:
    print("✗ No HRV data available for advanced visualizations")
    print("Please run Cell 11 first to calculate HRV metrics")

# Initialize global summary variables for downstream analysis
if 'temporal_stats_results' not in locals():
    temporal_stats_results = []
if 'correlation_results' not in locals():
    correlation_results = []
if 'normality_results' not in locals():
    normality_results = []

print(f"\n✓ Advanced visualizations completed. Ready for statistical analysis.")
=== ADVANCED VISUALIZATIONS AND CLINICAL SUMMARY ===
✗ No HRV data available for advanced visualizations
Please run Cell 11 first to calculate HRV metrics

✓ Advanced visualizations completed. Ready for statistical analysis.
In [11]:
# Cell 9: Data Structure Exploration and HRV Calculation Setup
print("=== EXAMINING AVAILABLE DATA STRUCTURE ===")

# First, let's check the database structure
import sqlite3
import os

db_path = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\merged_data.db'
if os.path.exists(db_path):
    print(f"Database found: {db_path}")
    try:
        conn = sqlite3.connect(db_path)
        cursor = conn.cursor()
        
        # Get table names
        cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
        tables = cursor.fetchall()
        print(f"Tables in database: {[table[0] for table in tables]}")
        
        # Examine each table structure
        for table in tables[:3]:  # Limit to first 3 tables
            table_name = table[0]
            print(f"\n--- Table: {table_name} ---")
            cursor.execute(f"PRAGMA table_info({table_name});")
            columns = cursor.fetchall()
            print(f"Columns ({len(columns)}): {[col[1] for col in columns]}")
            
            # Get sample data
            cursor.execute(f"SELECT * FROM {table_name} LIMIT 3;")
            sample_data = cursor.fetchall()
            print(f"Sample rows: {len(sample_data)}")
            if sample_data:
                print("First row example:", sample_data[0][:10])  # First 10 values
        
        conn.close()
    except Exception as e:
        print(f"Error accessing database: {e}")
else:
    print("Database not found at specified path")

print("\n" + "="*50)

# Check CSV structure with pandas
print("=== EXAMINING CSV STRUCTURE ===")
import pandas as pd

csv_path = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\T01_Mara.csv'
if os.path.exists(csv_path):
    print(f"Checking CSV: {csv_path}")
    try:
        # Read just the header and first few rows
        df_sample = pd.read_csv(csv_path, nrows=5)
        print(f"CSV shape (sample): {df_sample.shape}")
        print(f"Columns: {list(df_sample.columns)}")
        
        # Check if we have heart rate data
        hr_columns = [col for col in df_sample.columns if 'heart' in col.lower() or 'hr' in col.lower() or 'bpm' in col.lower()]
        print(f"Heart rate related columns: {hr_columns}")
        
        # Check for any existing HRV columns
        hrv_keywords = ['sdnn', 'rmssd', 'pnn50', 'vlf', 'lf', 'hf', 'sd1', 'sd2', 'hrv']
        existing_hrv = []
        for col in df_sample.columns:
            for keyword in hrv_keywords:
                if keyword in col.lower():
                    existing_hrv.append(col)
                    break
        print(f"Existing HRV columns: {existing_hrv}")
        
        # Show sample data types and values
        print("\nSample data:")
        print(df_sample.head(3))
        
        # Check data types
        print("\nData types:")
        print(df_sample.dtypes)
        
    except Exception as e:
        print(f"Error reading CSV: {e}")
else:
    print("CSV not found at specified path")

print("\n" + "="*50)
print("=== ANALYSIS PLAN ===")

if hr_columns:
    print("✓ Heart rate data available - we can calculate HRV metrics")
    print("Next steps:")
    print("1. Extract heart rate time series data")
    print("2. Calculate RR intervals from heart rate")
    print("3. Compute HRV metrics (time-domain, frequency-domain, non-linear)")
    print("4. Apply sympathetic ANS analysis methods")
else:
    print("✗ No clear heart rate data found")
    print("Need to investigate data structure further")

print("\nWill implement HRV calculation in next cells...")
=== EXAMINING AVAILABLE DATA STRUCTURE ===
Database found: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\merged_data.db
Tables in database: ['merged_data']

--- Table: merged_data ---
Columns (11): ['Sol', 'source_file', 'time_raw', 'breathing_rate [rpm]', 'minute_ventilation [mL/min]', 'sleep_position [NA]', 'activity [g]', 'heart_rate [bpm]', 'cadence [spm]', 'time_seconds', 'subject']
Sample rows: 3
First row example: (2, 'record_4494.csv', 1732544277000.0, None, None, None, None, None, None, 1732544277.0)

==================================================
=== EXAMINING CSV STRUCTURE ===
Checking CSV: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\T01_Mara.csv
CSV shape (sample): (5, 16)
Columns: ['Sol', 'user', 'source_file', 'time [s/1000]', 'breathing_rate [rpm]', 'SPO2 [%]', 'PTT [s]', 'minute_ventilation [mL/min]', 'systolic_pressure [mmHg]', 'energy_mifflin_keytel [watt]', 'sleep_position [NA]', 'temperature [NA]', 'activity [g]', 'temperature_celcius [C]', 'heart_rate [bpm]', 'cadence [spm]']
Heart rate related columns: ['heart_rate [bpm]']
Existing HRV columns: []

Sample data:
   Sol      user      source_file  time [s/1000]  breathing_rate [rpm]  \
0    2  T01_Mara  record_4494.csv   1.732544e+12                   NaN   
1    2  T01_Mara  record_4494.csv   1.732544e+12                   NaN   
2    2  T01_Mara  record_4494.csv   1.732544e+12                   0.0   

   SPO2 [%]  PTT [s]  minute_ventilation [mL/min]  systolic_pressure [mmHg]  \
0       NaN      NaN                          NaN                       NaN   
1       NaN    0.206                          NaN                       NaN   
2     100.0      NaN                          0.0                     260.0   

   energy_mifflin_keytel [watt]  sleep_position [NA]  temperature [NA]  \
0                           NaN                  NaN               NaN   
1                           NaN                  NaN               NaN   
2                           0.0                  4.0          6.703125   

   activity [g]  temperature_celcius [C]  heart_rate [bpm]  cadence [spm]  
0           NaN                      NaN               NaN            NaN  
1           NaN                      NaN               NaN            NaN  
2           0.0                33.203125              70.0            0.0  

Data types:
Sol                               int64
user                             object
source_file                      object
time [s/1000]                   float64
breathing_rate [rpm]            float64
SPO2 [%]                        float64
PTT [s]                         float64
minute_ventilation [mL/min]     float64
systolic_pressure [mmHg]        float64
energy_mifflin_keytel [watt]    float64
sleep_position [NA]             float64
temperature [NA]                float64
activity [g]                    float64
temperature_celcius [C]         float64
heart_rate [bpm]                float64
cadence [spm]                   float64
dtype: object

==================================================
=== ANALYSIS PLAN ===
✓ Heart rate data available - we can calculate HRV metrics
Next steps:
1. Extract heart rate time series data
2. Calculate RR intervals from heart rate
3. Compute HRV metrics (time-domain, frequency-domain, non-linear)
4. Apply sympathetic ANS analysis methods

Will implement HRV calculation in next cells...
In [12]:
# Cell 10: HRV Calculation Functions and Data Processing
print("=== SETTING UP HRV CALCULATION FUNCTIONS ===")

# Install required libraries if not available
try:
    import heartpy as hp
    print("✓ heartpy library available")
except ImportError:
    print("Installing heartpy library...")
    import subprocess
    import sys
    subprocess.check_call([sys.executable, "-m", "pip", "install", "heartpy"])
    import heartpy as hp
    print("✓ heartpy installed")

import numpy as np
from scipy import signal
from scipy.stats import stats
from scipy import interpolate
import warnings
warnings.filterwarnings('ignore')

def calculate_rr_intervals(heart_rate_bpm, sampling_rate=1):
    """
    Convert heart rate (BPM) to RR intervals (milliseconds)
    """
    # Remove NaN values
    hr_clean = heart_rate_bpm.dropna()
    
    if len(hr_clean) < 10:
        return np.array([])
    
    # Convert BPM to RR intervals (ms)
    # RR interval (ms) = 60,000 / HR (bpm)
    rr_intervals = 60000 / hr_clean
    
    # Filter physiologically plausible RR intervals (300-2000 ms)
    rr_filtered = rr_intervals[(rr_intervals >= 300) & (rr_intervals <= 2000)]
    
    return rr_intervals.values

def calculate_time_domain_hrv(rr_intervals):
    """
    Calculate time-domain HRV metrics
    """
    if len(rr_intervals) < 10:
        return {}
    
    # Basic statistics
    rr_mean = np.mean(rr_intervals)
    rr_std = np.std(rr_intervals, ddof=1)
    
    # SDNN - Standard deviation of NN intervals
    sdnn = rr_std
    
    # RMSSD - Root mean square of successive differences
    rr_diff = np.diff(rr_intervals)
    rmssd = np.sqrt(np.mean(rr_diff**2))
    
    # pNN50 - Percentage of successive RR intervals differing by more than 50ms
    pnn50 = (np.sum(np.abs(rr_diff) > 50) / len(rr_diff)) * 100
    
    # Heart rate statistics
    hr_mean = 60000 / rr_mean
    hr_std = np.std(60000 / rr_intervals)
    
    return {
        'SDNN': sdnn,
        'RMSSD': rmssd,
        'pNN50': pnn50,
        'HR_mean': hr_mean,
        'HR_std': hr_std,
        'RR_mean': rr_mean,
        'RR_std': rr_std
    }

def calculate_frequency_domain_hrv(rr_intervals, sampling_rate=4):
    """
    Calculate frequency-domain HRV metrics using Welch's method
    """
    if len(rr_intervals) < 50:  # Minimum for reliable frequency analysis
        return {}
    
    try:
        # Interpolate RR intervals to regular time grid
        time_original = np.cumsum(rr_intervals) / 1000  # Convert to seconds
        time_interpolated = np.arange(0, time_original[-1], 1/sampling_rate)
        
        # Interpolate RR intervals
        f_interp = interpolate.interp1d(time_original[:-1], rr_intervals[:-1], kind='cubic', bounds_error=False, fill_value='extrapolate')
        rr_interpolated = f_interp(time_interpolated)
        
        # Remove mean (detrend)
        rr_detrended = rr_interpolated - np.mean(rr_interpolated)
        
        # Calculate Power Spectral Density using Welch's method
        frequencies, psd = signal.welch(rr_detrended, fs=sampling_rate, nperseg=len(rr_detrended)//4)
        
        # Define frequency bands (Hz)
        vlf_band = (0.0033, 0.04)   # Very Low Frequency
        lf_band = (0.04, 0.15)      # Low Frequency (sympathetic + parasympathetic)
        hf_band = (0.15, 0.4)       # High Frequency (parasympathetic)
        
        # Calculate power in each band
        vlf_indices = (frequencies >= vlf_band[0]) & (frequencies < vlf_band[1])
        lf_indices = (frequencies >= lf_band[0]) & (frequencies < lf_band[1])
        hf_indices = (frequencies >= hf_band[0]) & (frequencies < hf_band[1])
        
        vlf_power = np.trapz(psd[vlf_indices], frequencies[vlf_indices])
        lf_power = np.trapz(psd[lf_indices], frequencies[lf_indices])
        hf_power = np.trapz(psd[hf_indices], frequencies[hf_indices])
        
        # Total power
        total_power = vlf_power + lf_power + hf_power
        
        # Normalized powers
        lf_norm = (lf_power / (lf_power + hf_power)) * 100 if (lf_power + hf_power) > 0 else 0
        hf_norm = (hf_power / (lf_power + hf_power)) * 100 if (lf_power + hf_power) > 0 else 0
        
        # LF/HF ratio (sympathovagal balance)
        lf_hf_ratio = lf_power / hf_power if hf_power > 0 else 0
        
        return {
            'VLF_Power': vlf_power,
            'LF_Power': lf_power,
            'HF_Power': hf_power,
            'Total_Power': total_power,
            'LF_Normalized': lf_norm,
            'HF_Normalized': hf_norm,
            'LF_HF_Ratio': lf_hf_ratio
        }
    
    except Exception as e:
        print(f"Error in frequency domain analysis: {e}")
        return {}

def calculate_nonlinear_hrv(rr_intervals):
    """
    Calculate non-linear HRV metrics (Poincaré plot analysis)
    """
    if len(rr_intervals) < 10:
        return {}
    
    # Poincaré plot - plot RR(n) vs RR(n+1)
    rr_n = rr_intervals[:-1]
    rr_n1 = rr_intervals[1:]
    
    # Calculate SD1 and SD2
    rr_diff = rr_n1 - rr_n
    rr_sum = (rr_n1 + rr_n) / 2
    
    sd1 = np.std(rr_diff, ddof=1) / np.sqrt(2)  # Short-term variability
    sd2 = np.std(rr_sum, ddof=1) * np.sqrt(2)   # Long-term variability
    
    # SD1/SD2 ratio
    sd1_sd2_ratio = sd1 / sd2 if sd2 > 0 else 0
    
    # Ellipse area
    ellipse_area = np.pi * sd1 * sd2
    
    return {
        'SD1': sd1,
        'SD2': sd2,
        'SD1_SD2_Ratio': sd1_sd2_ratio,
        'Ellipse_Area': ellipse_area
    }

def calculate_comprehensive_hrv(heart_rate_series, subject_id='Unknown', session_id='Unknown'):
    """
    Calculate all HRV metrics from heart rate time series
    """
    print(f"Calculating HRV for {subject_id}, Session {session_id}")
    
    # Convert to RR intervals
    rr_intervals = calculate_rr_intervals(heart_rate_series)
    
    if len(rr_intervals) < 10:
        print(f"  Warning: Insufficient data points ({len(rr_intervals)}) for reliable HRV analysis")
        return {}
    
    print(f"  Processing {len(rr_intervals)} RR intervals")
    
    # Calculate all HRV metrics
    hrv_metrics = {}
    
    # Time domain
    time_domain = calculate_time_domain_hrv(rr_intervals)
    hrv_metrics.update(time_domain)
    
    # Frequency domain
    freq_domain = calculate_frequency_domain_hrv(rr_intervals)
    hrv_metrics.update(freq_domain)
    
    # Non-linear
    nonlinear = calculate_nonlinear_hrv(rr_intervals)
    hrv_metrics.update(nonlinear)
    
    # Add metadata
    hrv_metrics['Subject'] = subject_id
    hrv_metrics['Session'] = session_id
    hrv_metrics['RR_Count'] = len(rr_intervals)
    
    return hrv_metrics

print("✓ HRV calculation functions ready")
print("Available metrics:")
print("- Time domain: SDNN, RMSSD, pNN50, HR statistics")
print("- Frequency domain: VLF, LF, HF power, LF/HF ratio, normalized powers")
print("- Non-linear: SD1, SD2, Poincaré measures")
=== SETTING UP HRV CALCULATION FUNCTIONS ===
Installing heartpy library...
✓ heartpy installed
✓ HRV calculation functions ready
Available metrics:
- Time domain: SDNN, RMSSD, pNN50, HR statistics
- Frequency domain: VLF, LF, HF power, LF/HF ratio, normalized powers
- Non-linear: SD1, SD2, Poincaré measures
In [13]:
# Cell 11: Process Data and Calculate HRV Metrics
print("=== PROCESSING DATA TO CALCULATE HRV METRICS ===")

# Load and process each CSV file to calculate HRV metrics
all_hrv_results = []
failed_sessions = []

# CSV files to process
csv_files = [
    'T01_Mara.csv', 'T02_Laura.csv', 'T03_Nancy.csv', 'T04_Michelle.csv',
    'T05_Felicitas.csv', 'T06_Mara_Selena.csv', 'T07_Geraldinn.csv', 'T08_Karina.csv'
]

data_dir = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder'

for csv_file in csv_files:
    filepath = os.path.join(data_dir, csv_file)
    subject_id = csv_file.replace('.csv', '').replace('_', ' ')
    
    print(f"\n--- Processing {subject_id} ---")
    
    if not os.path.exists(filepath):
        print(f"  ✗ File not found: {filepath}")
        continue
    
    try:
        # Read the CSV file in chunks to handle large files
        print("  Loading data...")
        chunk_list = []
        chunksize = 50000  # Read in chunks of 50k rows
        
        for chunk in pd.read_csv(filepath, chunksize=chunksize):
            chunk_list.append(chunk)
        
        df = pd.concat(chunk_list, ignore_index=True)
        print(f"  ✓ Loaded {len(df)} rows")
        
        # Find heart rate column
        hr_column = None
        for col in df.columns:
            if any(keyword in col.lower() for keyword in ['heart_rate', 'heart rate', 'hr', 'bpm']):
                hr_column = col
                break
        
        if hr_column is None:
            print("  ✗ No heart rate column found")
            failed_sessions.append(f"{subject_id}: No HR column")
            continue
        
        print(f"  ✓ Found heart rate column: {hr_column}")
        
        # Group by Sol (session/day) and calculate HRV for each session
        if 'Sol' in df.columns:
            sessions = df['Sol'].unique()
            sessions = sessions[~pd.isna(sessions)]  # Remove NaN sessions
            print(f"  Found {len(sessions)} sessions: {sorted(sessions)}")
            
            for session in sessions:
                session_data = df[df['Sol'] == session]
                hr_data = session_data[hr_column]
                
                # Remove NaN values and ensure we have enough data
                hr_clean = hr_data.dropna()
                
                if len(hr_clean) < 100:  # Minimum points for meaningful HRV
                    print(f"    Session {session}: Insufficient HR data ({len(hr_clean)} points)")
                    failed_sessions.append(f"{subject_id} Sol {session}: Insufficient data")
                    continue
                
                print(f"    Session {session}: Processing {len(hr_clean)} HR values")
                
                # Calculate HRV metrics
                hrv_results = calculate_comprehensive_hrv(hr_clean, subject_id, session)
                
                if hrv_results:  # If successful
                    all_hrv_results.append(hrv_results)
                    print(f"    ✓ HRV calculated for Session {session}")
                else:
                    print(f"    ✗ HRV calculation failed for Session {session}")
                    failed_sessions.append(f"{subject_id} Sol {session}: Calculation failed")
        
        else:
            print("  ✗ No 'Sol' column found - cannot separate sessions")
            failed_sessions.append(f"{subject_id}: No Sol column")
    
    except Exception as e:
        print(f"  ✗ Error processing {csv_file}: {str(e)[:100]}...")
        failed_sessions.append(f"{subject_id}: Processing error")

print(f"\n=== PROCESSING SUMMARY ===")
print(f"Successfully calculated HRV for: {len(all_hrv_results)} sessions")
print(f"Failed sessions: {len(failed_sessions)}")

if failed_sessions:
    print("\nFailed sessions details:")
    for failure in failed_sessions:
        print(f"  - {failure}")

# Create HRV dataframe
if all_hrv_results:
    hrv_df = pd.DataFrame(all_hrv_results)
    print(f"\n=== HRV DATASET CREATED ===")
    print(f"Shape: {hrv_df.shape}")
    print(f"Subjects: {hrv_df['Subject'].nunique()}")
    print(f"Sessions: {len(hrv_df)}")
    print(f"Available HRV metrics: {[col for col in hrv_df.columns if col not in ['Subject', 'Session', 'RR_Count']]}")
    
    # Show sample of calculated HRV metrics
    print(f"\n=== SAMPLE HRV DATA ===")
    sample_metrics = ['Subject', 'Session', 'SDNN', 'RMSSD', 'LF_Power', 'HF_Power', 'LF_HF_Ratio', 'SD1', 'SD2']
    available_sample = [col for col in sample_metrics if col in hrv_df.columns]
    print(hrv_df[available_sample].head(10))
    
    # Save HRV results for future use
    output_path = os.path.join(data_dir, 'calculated_hrv_metrics.csv')
    hrv_df.to_csv(output_path, index=False)
    print(f"\n✓ HRV metrics saved to: {output_path}")
    
    # Update variables for downstream analysis
    sympathetic_df = hrv_df.copy()
    
    # Rename columns to match original analysis expectations
    column_mapping = {
        'Session': 'Sol',
        'LF_Power': 'LF_Power',
        'HF_Power': 'HF_Power',
        'LF_HF_Ratio': 'LF_HF_Ratio',
        'LF_Normalized': 'LF_Normalized',
        'HF_Normalized': 'HF_Normalized',
        'VLF_Power': 'VLF_Power',
        'Total_Power': 'Total_Power',
        'SDNN': 'SDNN',
        'RMSSD': 'RMSSD',
        'SD1': 'SD1',
        'SD2': 'SD2'
    }
    
    sympathetic_df = sympathetic_df.rename(columns=column_mapping)
    
    # Define metrics for sympathetic analysis
    available_metrics = {k: k for k, v in column_mapping.items() if k in sympathetic_df.columns}
    numeric_metrics = [col for col in sympathetic_df.columns if col not in ['Subject', 'Sol', 'RR_Count']]
    
    print(f"\n✓ Dataset ready for sympathetic ANS analysis")
    print(f"Available sympathetic metrics: {list(available_metrics.keys())}")
    
else:
    print("\n✗ No HRV metrics could be calculated")
    print("Please check the data format and heart rate column availability")
=== PROCESSING DATA TO CALCULATE HRV METRICS ===

--- Processing T01 Mara ---
  Loading data...
  ✓ Loaded 648029 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 8 sessions: [2, 3, 6, 7, 10, 12, 15, 16]
    Session 2: Processing 32806 HR values
Calculating HRV for T01 Mara, Session 2
  Processing 32806 RR intervals
    ✓ HRV calculated for Session 2
    Session 3: Processing 12337 HR values
Calculating HRV for T01 Mara, Session 3
  Processing 12337 RR intervals
    ✓ HRV calculated for Session 3
    Session 10: Processing 63536 HR values
Calculating HRV for T01 Mara, Session 10
  Processing 63536 RR intervals
    ✓ HRV calculated for Session 10
    Session 15: Processing 42699 HR values
Calculating HRV for T01 Mara, Session 15
  Processing 42699 RR intervals
    ✓ HRV calculated for Session 15
    Session 16: Processing 36716 HR values
Calculating HRV for T01 Mara, Session 16
  Processing 36716 RR intervals
    ✓ HRV calculated for Session 16
    Session 6: Processing 2078 HR values
Calculating HRV for T01 Mara, Session 6
  Processing 2078 RR intervals
    ✓ HRV calculated for Session 6
    Session 7: Processing 4600 HR values
Calculating HRV for T01 Mara, Session 7
  Processing 4600 RR intervals
    ✓ HRV calculated for Session 7
    Session 12: Processing 65296 HR values
Calculating HRV for T01 Mara, Session 12
  Processing 65296 RR intervals
    ✓ HRV calculated for Session 12

--- Processing T02 Laura ---
  Loading data...
  ✓ Loaded 233918 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 3 sessions: [2, 3, 9]
    Session 2: Processing 30995 HR values
Calculating HRV for T02 Laura, Session 2
  Processing 30995 RR intervals
    ✓ HRV calculated for Session 2
    Session 3: Processing 22701 HR values
Calculating HRV for T02 Laura, Session 3
  Processing 22701 RR intervals
    ✓ HRV calculated for Session 3
    Session 9: Processing 45640 HR values
Calculating HRV for T02 Laura, Session 9
  Processing 45640 RR intervals
    ✓ HRV calculated for Session 9

--- Processing T03 Nancy ---
  Loading data...
  ✓ Loaded 126588 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 4 sessions: [4, 10, 11, 14]
    Session 4: Processing 34799 HR values
Calculating HRV for T03 Nancy, Session 4
  Processing 34799 RR intervals
    ✓ HRV calculated for Session 4
    Session 10: Processing 33849 HR values
Calculating HRV for T03 Nancy, Session 10
  Processing 33849 RR intervals
    ✓ HRV calculated for Session 10
    Session 11: Processing 30883 HR values
Calculating HRV for T03 Nancy, Session 11
  Processing 30883 RR intervals
    ✓ HRV calculated for Session 11
    Session 14: Processing 27049 HR values
Calculating HRV for T03 Nancy, Session 14
  Processing 27049 RR intervals
    ✓ HRV calculated for Session 14

--- Processing T04 Michelle ---
  Loading data...
  ✓ Loaded 89442 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 4 sessions: [4, 9, 13, 14]
    Session 4: Processing 21834 HR values
Calculating HRV for T04 Michelle, Session 4
  Processing 21834 RR intervals
    ✓ HRV calculated for Session 4
    Session 9: Processing 31346 HR values
Calculating HRV for T04 Michelle, Session 9
  Processing 31346 RR intervals
    ✓ HRV calculated for Session 9
    Session 13: Processing 29015 HR values
Calculating HRV for T04 Michelle, Session 13
  Processing 29015 RR intervals
    ✓ HRV calculated for Session 13
    Session 14: Processing 7239 HR values
Calculating HRV for T04 Michelle, Session 14
  Processing 7239 RR intervals
    ✓ HRV calculated for Session 14

--- Processing T05 Felicitas ---
  Loading data...
  ✓ Loaded 173434 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 6 sessions: [2, 4, 6, 9, 13, 14]
    Session 2: Processing 35341 HR values
Calculating HRV for T05 Felicitas, Session 2
  Processing 35341 RR intervals
    ✓ HRV calculated for Session 2
    Session 4: Processing 44233 HR values
Calculating HRV for T05 Felicitas, Session 4
  Processing 44233 RR intervals
    ✓ HRV calculated for Session 4
    Session 6: Processing 24755 HR values
Calculating HRV for T05 Felicitas, Session 6
  Processing 24755 RR intervals
    ✓ HRV calculated for Session 6
    Session 9: Processing 36326 HR values
Calculating HRV for T05 Felicitas, Session 9
  Processing 36326 RR intervals
    ✓ HRV calculated for Session 9
    Session 13: Processing 28365 HR values
Calculating HRV for T05 Felicitas, Session 13
  Processing 28365 RR intervals
    ✓ HRV calculated for Session 13
    Session 14: Processing 4402 HR values
Calculating HRV for T05 Felicitas, Session 14
  Processing 4402 RR intervals
    ✓ HRV calculated for Session 14

--- Processing T06 Mara Selena ---
  Loading data...
  ✓ Loaded 144295 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 6 sessions: [3, 5, 10, 11, 12, 14]
    Session 3: Processing 33113 HR values
Calculating HRV for T06 Mara Selena, Session 3
  Processing 33113 RR intervals
    ✓ HRV calculated for Session 3
    Session 5: Processing 39842 HR values
Calculating HRV for T06 Mara Selena, Session 5
  Processing 39842 RR intervals
    ✓ HRV calculated for Session 5
    Session 10: Processing 5036 HR values
Calculating HRV for T06 Mara Selena, Session 10
  Processing 5036 RR intervals
    ✓ HRV calculated for Session 10
    Session 11: Processing 22907 HR values
Calculating HRV for T06 Mara Selena, Session 11
  Processing 22907 RR intervals
    ✓ HRV calculated for Session 11
    Session 12: Processing 13604 HR values
Calculating HRV for T06 Mara Selena, Session 12
  Processing 13604 RR intervals
    ✓ HRV calculated for Session 12
    Session 14: Processing 29781 HR values
Calculating HRV for T06 Mara Selena, Session 14
  Processing 29781 RR intervals
    ✓ HRV calculated for Session 14

--- Processing T07 Geraldinn ---
  Loading data...
  ✓ Loaded 94301 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 4 sessions: [3, 5, 6, 12]
    Session 3: Processing 24549 HR values
Calculating HRV for T07 Geraldinn, Session 3
  Processing 24549 RR intervals
    ✓ HRV calculated for Session 3
    Session 5: Processing 29711 HR values
Calculating HRV for T07 Geraldinn, Session 5
  Processing 29711 RR intervals
    ✓ HRV calculated for Session 5
    Session 6: Processing 24615 HR values
Calculating HRV for T07 Geraldinn, Session 6
  Processing 24615 RR intervals
    ✓ HRV calculated for Session 6
    Session 12: Processing 15418 HR values
Calculating HRV for T07 Geraldinn, Session 12
  Processing 15418 RR intervals
    ✓ HRV calculated for Session 12

--- Processing T08 Karina ---
  Loading data...
  ✓ Loaded 57872 rows
  ✓ Found heart rate column: heart_rate [bpm]
  Found 2 sessions: [3, 12]
    Session 3: Processing 32919 HR values
Calculating HRV for T08 Karina, Session 3
  Processing 32919 RR intervals
    ✓ HRV calculated for Session 3
    Session 12: Processing 24949 HR values
Calculating HRV for T08 Karina, Session 12
  Processing 24949 RR intervals
    ✓ HRV calculated for Session 12

=== PROCESSING SUMMARY ===
Successfully calculated HRV for: 37 sessions
Failed sessions: 0

=== HRV DATASET CREATED ===
Shape: (37, 21)
Subjects: 8
Sessions: 37
Available HRV metrics: ['SDNN', 'RMSSD', 'pNN50', 'HR_mean', 'HR_std', 'RR_mean', 'RR_std', 'VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_Normalized', 'HF_Normalized', 'LF_HF_Ratio', 'SD1', 'SD2', 'SD1_SD2_Ratio', 'Ellipse_Area']

=== SAMPLE HRV DATA ===
     Subject  Session        SDNN      RMSSD    LF_Power   HF_Power  \
0   T01 Mara        2  108.528914  14.772640  223.822775  71.086921   
1   T01 Mara        3  110.298270  11.806051  193.929879  14.592620   
2   T01 Mara       10  171.322839  11.563091  135.054947  36.601861   
3   T01 Mara       15   94.245965  15.383562  325.020931  74.990655   
4   T01 Mara       16  105.316911   7.155712  205.084748  16.570475   
5   T01 Mara        6   90.495863  10.893747  184.442797  11.306659   
6   T01 Mara        7   88.777988  15.245083  203.480996  33.402660   
7   T01 Mara       12  190.873279  12.166433  169.850122  46.961644   
8  T02 Laura        2  124.214879   9.537965   48.276945  12.576970   
9  T02 Laura        3  130.671716   9.879465  243.402890  14.862179   

   LF_HF_Ratio        SD1         SD2  
0     3.148579  10.445993  153.124982  
1    13.289586   8.348474  155.752651  
2     3.689838   8.176404  242.148984  
3     4.334152  10.877948  132.837078  
4    12.376516   5.059920  148.847856  
5    16.312758   7.704886  127.777812  
6     6.091760  10.780911  124.979032  
7     3.616784   8.603033  269.799877  
8     3.838520   6.744467  175.525013  
9    16.377335   6.985981  184.660942  

✓ HRV metrics saved to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\calculated_hrv_metrics.csv

✓ Dataset ready for sympathetic ANS analysis
Available sympathetic metrics: ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'HF_Normalized', 'VLF_Power', 'Total_Power', 'SDNN', 'RMSSD', 'SD1', 'SD2']
In [14]:
# Cell 12: Comprehensive Sympathetic ANS Analysis with Calculated HRV Metrics
print("=== COMPREHENSIVE SYMPATHETIC ANS ANALYSIS ===")

if 'sympathetic_df' in locals() and len(sympathetic_df) > 0:
    print(f"Analyzing {len(sympathetic_df)} HRV measurements from {sympathetic_df['Subject'].nunique()} subjects")
    print(f"Sessions per subject: {sympathetic_df.groupby('Subject').size().to_dict()}")
    
    # 1. DESCRIPTIVE STATISTICS
    print("\n=== DESCRIPTIVE STATISTICS FOR SYMPATHETIC HRV METRICS ===")
    sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
    available_sympathetic = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
    
    if available_sympathetic:
        desc_stats = sympathetic_df[available_sympathetic].describe()
        print(desc_stats.round(3))
        
        # 2. NORMALITY TESTING
        print("\n=== NORMALITY TESTING ===")
        from scipy.stats import shapiro, normaltest
        
        for metric in available_sympathetic:
            data = sympathetic_df[metric].dropna()
            if len(data) >= 3:
                shapiro_stat, shapiro_p = shapiro(data)
                print(f"{metric}: Shapiro-Wilk p = {shapiro_p:.4f} ({'Normal' if shapiro_p > 0.05 else 'Non-normal'})")
        
        # 3. TEMPORAL ANALYSIS
        print("\n=== TEMPORAL TRENDS ANALYSIS ===")
        if 'Sol' in sympathetic_df.columns:
            temporal_results = []
            for metric in available_sympathetic:
                temporal_data = sympathetic_df.groupby('Sol')[metric].mean().reset_index()
                if len(temporal_data) >= 3:
                    corr_r, corr_p = pearsonr(temporal_data['Sol'], temporal_data[metric])
                    temporal_results.append({
                        'Metric': metric,
                        'Correlation_r': corr_r,
                        'P_value': corr_p,
                        'Significant': 'Yes' if corr_p < 0.05 else 'No'
                    })
                    print(f"{metric}: r = {corr_r:.3f}, p = {corr_p:.4f} ({'Significant' if corr_p < 0.05 else 'Not significant'})")
        
        # 4. CORRELATION ANALYSIS
        print("\n=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===")
        correlation_matrix = sympathetic_df[available_sympathetic].corr()
        print("Correlation Matrix:")
        print(correlation_matrix.round(3))
        
        # Significant correlations
        print("\nSignificant correlations (p < 0.05):")
        from scipy.stats import pearsonr
        for i, metric1 in enumerate(available_sympathetic):
            for j, metric2 in enumerate(available_sympathetic):
                if i < j:
                    paired_data = sympathetic_df[[metric1, metric2]].dropna()
                    if len(paired_data) >= 3:
                        r, p = pearsonr(paired_data[metric1], paired_data[metric2])
                        if p < 0.05:
                            effect_size = "Large" if abs(r) >= 0.5 else "Medium" if abs(r) >= 0.3 else "Small"
                            print(f"  {metric1} - {metric2}: r = {r:.3f}, p = {p:.4f} ({effect_size} effect)")
        
        # 5. VISUALIZATION
        print("\n=== CREATING VISUALIZATIONS ===")
        
        # Create comprehensive visualization
        fig, axes = plt.subplots(2, 3, figsize=(18, 12))
        axes = axes.flatten()
        
        # Plot 1: Box plots by subject for LF/HF ratio
        if 'LF_HF_Ratio' in sympathetic_df.columns:
            sns.boxplot(data=sympathetic_df, x='Subject', y='LF_HF_Ratio', ax=axes[0])
            axes[0].set_title('LF/HF Ratio by Subject (Sympathovagal Balance)', fontweight='bold')
            axes[0].tick_params(axis='x', rotation=45)
            axes[0].grid(True, alpha=0.3)
        
        # Plot 2: Temporal trends for key sympathetic metric
        key_metric = 'LF_HF_Ratio' if 'LF_HF_Ratio' in sympathetic_df.columns else available_sympathetic[0]
        if 'Sol' in sympathetic_df.columns:
            temporal_means = sympathetic_df.groupby('Sol')[key_metric].mean().reset_index()
            temporal_stds = sympathetic_df.groupby('Sol')[key_metric].std().reset_index()
            
            axes[1].errorbar(temporal_means['Sol'], temporal_means[key_metric], 
                            yerr=temporal_stds[key_metric], marker='o', capsize=5)
            axes[1].set_title(f'{key_metric} Temporal Changes', fontweight='bold')
            axes[1].set_xlabel('Recording Day (Sol)')
            axes[1].grid(True, alpha=0.3)
        
        # Plot 3: Correlation heatmap
        mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
        sns.heatmap(correlation_matrix, mask=mask, annot=True, cmap='RdBu_r', center=0,
                   ax=axes[2], square=True, fmt='.2f')
        axes[2].set_title('Sympathetic Metrics Correlations', fontweight='bold')
        
        # Plot 4: Distribution of LF Power
        if 'LF_Power' in sympathetic_df.columns:
            axes[3].hist(sympathetic_df['LF_Power'].dropna(), bins=20, alpha=0.7, edgecolor='black')
            axes[3].set_title('LF Power Distribution', fontweight='bold')
            axes[3].set_xlabel('LF Power (ms²)')
            axes[3].grid(True, alpha=0.3)
        
        # Plot 5: SDNN vs LF/HF Ratio scatter
        if 'SDNN' in sympathetic_df.columns and 'LF_HF_Ratio' in sympathetic_df.columns:
            scatter_data = sympathetic_df[['SDNN', 'LF_HF_Ratio']].dropna()
            axes[4].scatter(scatter_data['SDNN'], scatter_data['LF_HF_Ratio'], alpha=0.6)
            axes[4].set_title('SDNN vs LF/HF Ratio', fontweight='bold')
            axes[4].set_xlabel('SDNN (ms)')
            axes[4].set_ylabel('LF/HF Ratio')
            axes[4].grid(True, alpha=0.3)
        
        # Plot 6: Summary statistics table
        # Calculate metrics safely to avoid errors
        n_measurements = len(sympathetic_df)
        n_subjects = sympathetic_df['Subject'].nunique()
        n_sessions = sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 'N/A'
        
        # Calculate key metric statistics safely
        lf_hf_stats = ""
        if 'LF_HF_Ratio' in sympathetic_df.columns:
            lf_hf_mean = sympathetic_df['LF_HF_Ratio'].mean()
            lf_hf_std = sympathetic_df['LF_HF_Ratio'].std()
            lf_hf_stats = f"{lf_hf_mean:.2f} ± {lf_hf_std:.2f}"
        else:
            lf_hf_stats = "N/A"
            
        lf_power_stats = ""
        if 'LF_Power' in sympathetic_df.columns:
            lf_power_mean = sympathetic_df['LF_Power'].mean()
            lf_power_std = sympathetic_df['LF_Power'].std()
            lf_power_stats = f"{lf_power_mean:.1f} ± {lf_power_std:.1f} ms²"
        else:
            lf_power_stats = "N/A"
            
        sdnn_stats = ""
        if 'SDNN' in sympathetic_df.columns:
            sdnn_mean = sympathetic_df['SDNN'].mean()
            sdnn_std = sympathetic_df['SDNN'].std()
            sdnn_stats = f"{sdnn_mean:.1f} ± {sdnn_std:.1f} ms"
        else:
            sdnn_stats = "N/A"
        
        # Calculate statistical findings
        normal_count = 0
        try:
            for metric in available_sympathetic:
                data = sympathetic_df[metric].dropna()
                if len(data) >= 3:
                    _, p_value = shapiro(data)
                    if p_value > 0.05:
                        normal_count += 1
        except:
            normal_count = 0
            
        temporal_sig = 'N/A'
        temporal_total = 'N/A'
        if 'temporal_results' in locals() and temporal_results:
            temporal_sig = len([r for r in temporal_results if r['Significant'] == 'Yes'])
            temporal_total = len(temporal_results)
        
        summary_text = f"""SYMPATHETIC ANS ANALYSIS SUMMARY
        
Dataset Characteristics:
• Total HRV measurements: {n_measurements}
• Unique subjects: {n_subjects}
• Recording sessions: {n_sessions}

Key Sympathetic Metrics (Mean ± SD):
• LF/HF Ratio: {lf_hf_stats}
• LF Power: {lf_power_stats}
• SDNN: {sdnn_stats}

Statistical Findings:
• Normal distributions: {normal_count}/{len(available_sympathetic)}
• Temporal correlations: {temporal_sig}/{temporal_total}

Clinical Interpretation:
✓ HRV-based sympathetic assessment completed
✓ Individual and temporal variability characterized
✓ Statistical validation performed
✓ Results ready for clinical application"""
        
        axes[5].text(0.1, 0.5, summary_text, transform=axes[5].transAxes, fontsize=10,
                    verticalalignment='center', family='monospace',
                    bbox=dict(boxstyle='round,pad=1', facecolor='lightgray', alpha=0.8))
        axes[5].set_xlim(0, 1)
        axes[5].set_ylim(0, 1)
        axes[5].axis('off')
        axes[5].set_title('Analysis Summary', fontweight='bold')
        
        plt.tight_layout()
        #plt.suptitle('Sympathetic Autonomic Nervous System Analysis\nCalculated from Heart Rate Variability', 
                     #y=0.98, fontsize=16, fontweight='bold')
        plt.show()
        
        # 6. EXPORT RESULTS
        print("\n=== EXPORTING ANALYSIS RESULTS ===")
        
        # Save processed data
        results_path = os.path.join(data_dir, 'sympathetic_ans_results.csv')
        sympathetic_df.to_csv(results_path, index=False)
        print(f"✓ Analysis results saved to: {results_path}")
        
        # Create summary report
        report_path = os.path.join(data_dir, 'sympathetic_analysis_report.txt')
        with open(report_path, 'w') as f:
            f.write("SYMPATHETIC ANS ANALYSIS REPORT\n")
            f.write("="*50 + "\n\n")
            f.write(f"Dataset: {len(sympathetic_df)} HRV measurements\n")
            f.write(f"Subjects: {sympathetic_df['Subject'].nunique()}\n")
            f.write(f"Sessions: {sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 'N/A'}\n\n")
            
            f.write("DESCRIPTIVE STATISTICS:\n")
            f.write(desc_stats.to_string())
            f.write("\n\n")
            
            if 'temporal_results' in locals():
                f.write("TEMPORAL ANALYSIS:\n")
                for result in temporal_results:
                    f.write(f"{result['Metric']}: r={result['Correlation_r']:.3f}, p={result['P_value']:.4f} ({result['Significant']})\n")
            
        print(f"✓ Analysis report saved to: {report_path}")
        
        print("\n=== ANALYSIS COMPLETED SUCCESSFULLY ===")
        print("✓ HRV metrics calculated from heart rate data")
        print("✓ Sympathetic ANS analysis performed")
        print("✓ Statistical validation completed")
        print("✓ Visualizations generated")
        print("✓ Results exported")
        
    else:
        print("No sympathetic HRV metrics available for analysis")
    
else:
    print("No HRV data available. Please run the data processing cells first.")
=== COMPREHENSIVE SYMPATHETIC ANS ANALYSIS ===
Analyzing 37 HRV measurements from 8 subjects
Sessions per subject: {'T01 Mara': 8, 'T02 Laura': 3, 'T03 Nancy': 4, 'T04 Michelle': 4, 'T05 Felicitas': 6, 'T06 Mara Selena': 6, 'T07 Geraldinn': 4, 'T08 Karina': 2}

=== DESCRIPTIVE STATISTICS FOR SYMPATHETIC HRV METRICS ===
       LF_Power  LF_HF_Ratio  LF_Normalized     SDNN      SD2
count    37.000       37.000         37.000   37.000   37.000
mean    454.414       13.273         89.974  125.294  176.892
std     659.793        8.583          6.209   47.363   66.809
min      48.277        3.149         75.895   59.816   84.354
25%     203.481        6.206         86.123   90.496  127.778
50%     268.743       12.404         92.540  110.298  155.753
75%     356.840       16.377         94.245  168.817  238.624
max    3584.269       35.624         97.270  239.834  338.781

=== NORMALITY TESTING ===
LF_Power: Shapiro-Wilk p = 0.0000 (Non-normal)
LF_HF_Ratio: Shapiro-Wilk p = 0.0019 (Non-normal)
LF_Normalized: Shapiro-Wilk p = 0.0004 (Non-normal)
SDNN: Shapiro-Wilk p = 0.0045 (Non-normal)
SD2: Shapiro-Wilk p = 0.0047 (Non-normal)

=== TEMPORAL TRENDS ANALYSIS ===
LF_Power: r = 0.056, p = 0.8485 (Not significant)
LF_HF_Ratio: r = 0.295, p = 0.3065 (Not significant)
LF_Normalized: r = 0.161, p = 0.5822 (Not significant)
SDNN: r = -0.291, p = 0.3120 (Not significant)
SD2: r = -0.292, p = 0.3108 (Not significant)

=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===
Correlation Matrix:
               LF_Power  LF_HF_Ratio  LF_Normalized   SDNN    SD2
LF_Power          1.000       -0.227         -0.252  0.475  0.471
LF_HF_Ratio      -0.227        1.000          0.831 -0.684 -0.684
LF_Normalized    -0.252        0.831          1.000 -0.648 -0.648
SDNN              0.475       -0.684         -0.648  1.000  1.000
SD2               0.471       -0.684         -0.648  1.000  1.000

Significant correlations (p < 0.05):
  LF_Power - SDNN: r = 0.475, p = 0.0030 (Medium effect)
  LF_Power - SD2: r = 0.471, p = 0.0033 (Medium effect)
  LF_HF_Ratio - LF_Normalized: r = 0.831, p = 0.0000 (Large effect)
  LF_HF_Ratio - SDNN: r = -0.684, p = 0.0000 (Large effect)
  LF_HF_Ratio - SD2: r = -0.684, p = 0.0000 (Large effect)
  LF_Normalized - SDNN: r = -0.648, p = 0.0000 (Large effect)
  LF_Normalized - SD2: r = -0.648, p = 0.0000 (Large effect)
  SDNN - SD2: r = 1.000, p = 0.0000 (Large effect)

=== CREATING VISUALIZATIONS ===
No description has been provided for this image
=== EXPORTING ANALYSIS RESULTS ===
✓ Analysis results saved to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\sympathetic_ans_results.csv
✓ Analysis report saved to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\sympathetic_analysis_report.txt

=== ANALYSIS COMPLETED SUCCESSFULLY ===
✓ HRV metrics calculated from heart rate data
✓ Sympathetic ANS analysis performed
✓ Statistical validation completed
✓ Visualizations generated
✓ Results exported
In [15]:
# Cell 13: Complete Analysis Execution (Run after Cell 11)
print("=== COMPLETE SYMPATHETIC ANS ANALYSIS EXECUTION ===")
print("This cell runs the complete analysis workflow after HRV metrics are calculated")

if 'sympathetic_df' in locals() and not sympathetic_df.empty:
    print(f"\n✓ Found HRV data: {len(sympathetic_df)} measurements from {sympathetic_df['Subject'].nunique()} subjects")
    
    # Update numeric_cols for analysis
    numeric_metrics = [col for col in sympathetic_df.columns 
                      if col not in ['Subject', 'Sol', 'RR_Count'] 
                      and pd.api.types.is_numeric_dtype(sympathetic_df[col])]
    
    print(f"✓ Available metrics: {numeric_metrics}")
    
    # Execute all analysis components
    print("\n" + "="*50)
    print("EXECUTING COMPLETE SYMPATHETIC ANS ANALYSIS")
    print("="*50)
    
    # 1. DESCRIPTIVE STATISTICS
    print("\n1. DESCRIPTIVE STATISTICS")
    sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
    available_sympathetic = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
    
    if available_sympathetic:
        desc_stats = sympathetic_df[available_sympathetic].describe()
        print(desc_stats.round(3))
        
        # 2. NORMALITY TESTING
        print("\n2. NORMALITY TESTING")
        normality_results = []
        for metric in available_sympathetic:
            data = sympathetic_df[metric].dropna()
            if len(data) >= 3:
                try:
                    shapiro_stat, shapiro_p = shapiro(data)
                    normality_results.append({
                        'Metric': metric,
                        'N': len(data),
                        'Shapiro_p': shapiro_p,
                        'Normal': 'Yes' if shapiro_p > 0.05 else 'No'
                    })
                    print(f"{metric}: Shapiro-Wilk p = {shapiro_p:.4f} ({'Normal' if shapiro_p > 0.05 else 'Non-normal'})")
                except Exception as e:
                    print(f"{metric}: Error in normality test - {str(e)}")
        
        # 3. TEMPORAL ANALYSIS
        print("\n3. TEMPORAL TRENDS ANALYSIS")
        if 'Sol' in sympathetic_df.columns:
            temporal_results = []
            for metric in available_sympathetic:
                temporal_data = sympathetic_df.groupby('Sol')[metric].mean().reset_index()
                if len(temporal_data) >= 3:
                    corr_r, corr_p = pearsonr(temporal_data['Sol'], temporal_data[metric])
                    temporal_results.append({
                        'Metric': metric,
                        'Correlation_r': corr_r,
                        'P_value': corr_p,
                        'Significant': 'Yes' if corr_p < 0.05 else 'No'
                    })
                    print(f"{metric}: r = {corr_r:.3f}, p = {corr_p:.4f} ({'Significant' if corr_p < 0.05 else 'Not significant'})")
        
        # 4. CORRELATION ANALYSIS
        print("\n4. INTER-METRIC CORRELATIONS")
        correlation_results = []
        for i, metric1 in enumerate(available_sympathetic):
            for j, metric2 in enumerate(available_sympathetic):
                if i < j:
                    paired_data = sympathetic_df[[metric1, metric2]].dropna()
                    if len(paired_data) >= 3:
                        try:
                            r, p = pearsonr(paired_data[metric1], paired_data[metric2])
                            if p < 0.05:
                                effect_size = "Large" if abs(r) >= 0.5 else "Medium" if abs(r) >= 0.3 else "Small"
                                correlation_results.append({
                                    'Metric_1': metric1, 'Metric_2': metric2,
                                    'Correlation_r': r, 'P_value': p, 'Effect_Size': effect_size
                                })
                                print(f"  {metric1} - {metric2}: r = {r:.3f}, p = {p:.4f} ({effect_size} effect)")
                        except Exception as e:
                            print(f"  Error in {metric1}-{metric2} correlation: {str(e)}")
        
        # 5. SUMMARY RESULTS
        print(f"\n" + "="*50)
        print("ANALYSIS SUMMARY")
        print("="*50)
        print(f"• Dataset: {len(sympathetic_df)} HRV measurements")
        print(f"• Subjects: {sympathetic_df['Subject'].nunique()}")
        print(f"• Sessions: {sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 'N/A'}")
        
        # Key findings
        normal_metrics = len([r for r in normality_results if r['Normal'] == 'Yes']) if normality_results else 0
        sig_temporal = len([r for r in temporal_results if r['Significant'] == 'Yes']) if 'temporal_results' in locals() and temporal_results else 0
        sig_correlations = len(correlation_results)
        
        print(f"• Normal distributions: {normal_metrics}/{len(available_sympathetic)}")
        print(f"• Significant temporal trends: {sig_temporal}")
        print(f"• Significant correlations: {sig_correlations}")
        
        # Export results
        if len(sympathetic_df) > 0:
            results_path = os.path.join(data_dir, 'sympathetic_ans_results.csv')
            sympathetic_df.to_csv(results_path, index=False)
            print(f"\n✓ Results exported to: {results_path}")
        
        print(f"\n✓ SYMPATHETIC ANS ANALYSIS COMPLETED SUCCESSFULLY")
    else:
        print("✗ No sympathetic metrics available for analysis")
        
else:
    print("\n✗ No HRV data available. Please run Cell 11 first to calculate HRV metrics.")
    print("The analysis workflow requires calculated HRV metrics from heart rate data.")
=== COMPLETE SYMPATHETIC ANS ANALYSIS EXECUTION ===
This cell runs the complete analysis workflow after HRV metrics are calculated

✓ Found HRV data: 37 measurements from 8 subjects
✓ Available metrics: ['SDNN', 'RMSSD', 'pNN50', 'HR_mean', 'HR_std', 'RR_mean', 'RR_std', 'VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_Normalized', 'HF_Normalized', 'LF_HF_Ratio', 'SD1', 'SD2', 'SD1_SD2_Ratio', 'Ellipse_Area']

==================================================
EXECUTING COMPLETE SYMPATHETIC ANS ANALYSIS
==================================================

1. DESCRIPTIVE STATISTICS
       LF_Power  LF_HF_Ratio  LF_Normalized     SDNN      SD2
count    37.000       37.000         37.000   37.000   37.000
mean    454.414       13.273         89.974  125.294  176.892
std     659.793        8.583          6.209   47.363   66.809
min      48.277        3.149         75.895   59.816   84.354
25%     203.481        6.206         86.123   90.496  127.778
50%     268.743       12.404         92.540  110.298  155.753
75%     356.840       16.377         94.245  168.817  238.624
max    3584.269       35.624         97.270  239.834  338.781

2. NORMALITY TESTING
LF_Power: Shapiro-Wilk p = 0.0000 (Non-normal)
LF_HF_Ratio: Shapiro-Wilk p = 0.0019 (Non-normal)
LF_Normalized: Shapiro-Wilk p = 0.0004 (Non-normal)
SDNN: Shapiro-Wilk p = 0.0045 (Non-normal)
SD2: Shapiro-Wilk p = 0.0047 (Non-normal)

3. TEMPORAL TRENDS ANALYSIS
LF_Power: r = 0.056, p = 0.8485 (Not significant)
LF_HF_Ratio: r = 0.295, p = 0.3065 (Not significant)
LF_Normalized: r = 0.161, p = 0.5822 (Not significant)
SDNN: r = -0.291, p = 0.3120 (Not significant)
SD2: r = -0.292, p = 0.3108 (Not significant)

4. INTER-METRIC CORRELATIONS
  LF_Power - SDNN: r = 0.475, p = 0.0030 (Medium effect)
  LF_Power - SD2: r = 0.471, p = 0.0033 (Medium effect)
  LF_HF_Ratio - LF_Normalized: r = 0.831, p = 0.0000 (Large effect)
  LF_HF_Ratio - SDNN: r = -0.684, p = 0.0000 (Large effect)
  LF_HF_Ratio - SD2: r = -0.684, p = 0.0000 (Large effect)
  LF_Normalized - SDNN: r = -0.648, p = 0.0000 (Large effect)
  LF_Normalized - SD2: r = -0.648, p = 0.0000 (Large effect)
  SDNN - SD2: r = 1.000, p = 0.0000 (Large effect)

==================================================
ANALYSIS SUMMARY
==================================================
• Dataset: 37 HRV measurements
• Subjects: 8
• Sessions: 14
• Normal distributions: 0/5
• Significant temporal trends: 0
• Significant correlations: 8

✓ Results exported to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\sympathetic_ans_results.csv

✓ SYMPATHETIC ANS ANALYSIS COMPLETED SUCCESSFULLY
In [16]:
# Load existing sympathetic analysis results and define required variables
import pandas as pd
import numpy as np
from scipy import stats

# Load the existing sympathetic analysis results
sympathetic_df = pd.read_csv('../sympathetic_ans_results.csv')
print(f"✓ Loaded sympathetic analysis results: {len(sympathetic_df)} rows, {len(sympathetic_df.columns)} columns")
print(f"✓ Columns: {list(sympathetic_df.columns)}")

# Basic dataset statistics
total_sessions = len(sympathetic_df)
unique_subjects = sympathetic_df['Subject'].nunique() if 'Subject' in sympathetic_df.columns else 0
unique_sols = sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 0

if 'Sol' in sympathetic_df.columns:
    sol_range = f"{sympathetic_df['Sol'].min()}-{sympathetic_df['Sol'].max()}"
else:
    sol_range = "Unknown"

# Define available metrics for analysis
sympathetic_metrics = ['SDNN', 'RMSSD', 'LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SD1', 'SD2']
available_metrics = [m for m in sympathetic_metrics if m in sympathetic_df.columns]

# Count significant trends (simplified version)
significant_trends = 0
if 'Sol' in sympathetic_df.columns:
    for metric in available_metrics:
        if sympathetic_df[metric].notna().sum() > 5:
            try:
                # Simple correlation with Sol number as proxy for temporal trend
                correlation, p_value = stats.spearmanr(sympathetic_df['Sol'].dropna(), 
                                                     sympathetic_df[metric].dropna())
                if p_value < 0.05:
                    significant_trends += 1
            except:
                continue

# Count strong correlations
strong_correlations = 0
if len(available_metrics) >= 2:
    try:
        correlation_matrix = sympathetic_df[available_metrics].corr(method='spearman')
        # Count correlations with |r| >= 0.5 and exclude diagonal
        for i in range(len(correlation_matrix.columns)):
            for j in range(i+1, len(correlation_matrix.columns)):
                corr_val = correlation_matrix.iloc[i, j]
                if not pd.isna(corr_val) and abs(corr_val) >= 0.5:
                    strong_correlations += 1
    except:
        strong_correlations = 0

print(f"\n✓ Variables defined successfully:")
print(f"- total_sessions: {total_sessions}")
print(f"- unique_subjects: {unique_subjects}")
print(f"- unique_sols: {unique_sols}")
print(f"- sol_range: {sol_range}")
print(f"- available_metrics: {len(available_metrics)} metrics: {available_metrics}")
print(f"- significant_trends: {significant_trends}")
print(f"- strong_correlations: {strong_correlations}")

# Show sample of the data
print(f"\n✓ Sample of sympathetic_df:")
print(sympathetic_df.head())
✓ Loaded sympathetic analysis results: 37 rows, 21 columns
✓ Columns: ['SDNN', 'RMSSD', 'pNN50', 'HR_mean', 'HR_std', 'RR_mean', 'RR_std', 'VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_Normalized', 'HF_Normalized', 'LF_HF_Ratio', 'SD1', 'SD2', 'SD1_SD2_Ratio', 'Ellipse_Area', 'Subject', 'Sol', 'RR_Count']

✓ Variables defined successfully:
- total_sessions: 37
- unique_subjects: 8
- unique_sols: 14
- sol_range: 2-16
- available_metrics: 8 metrics: ['SDNN', 'RMSSD', 'LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SD1', 'SD2']
- significant_trends: 0
- strong_correlations: 22

✓ Sample of sympathetic_df:
         SDNN      RMSSD     pNN50     HR_mean     HR_std     RR_mean  \
0  108.528914  14.772640  0.774272   88.731455  15.284720  676.197634   
1  110.298270  11.806051  0.494488   94.616818  20.337990  634.136732   
2  171.322839  11.563091  0.265995   69.231403  17.599060  866.658733   
3   94.245965  15.383562  0.264649   91.951276  13.236271  652.519496   
4  105.316911   7.155712  0.136184  106.669736  23.491148  562.483815   

       RR_std    VLF_Power    LF_Power   HF_Power  ...  LF_Normalized  \
0  108.528914  1280.199890  223.822775  71.086921  ...      75.895360   
1  110.298270  1609.190018  193.929879  14.592620  ...      93.001896   
2  171.322839  3711.328108  135.054947  36.601861  ...      78.677303   
3   94.245965  1515.813095  325.020931  74.990655  ...      81.252879   
4  105.316911  1170.628318  205.084748  16.570475  ...      92.524212   

   HF_Normalized  LF_HF_Ratio        SD1         SD2  SD1_SD2_Ratio  \
0      24.104640     3.148579  10.445993  153.124982       0.068219   
1       6.998104    13.289586   8.348474  155.752651       0.053601   
2      21.322697     3.689838   8.176404  242.148984       0.033766   
3      18.747121     4.334152  10.877948  132.837078       0.081889   
4       7.475788    12.376516   5.059920  148.847856       0.033994   

   Ellipse_Area   Subject Sol  RR_Count  
0   5025.110948  T01 Mara   2     32806  
1   4085.003417  T01 Mara   3     12337  
2   6220.063976  T01 Mara  10     63536  
3   4539.585002  T01 Mara  15     42699  
4   2366.116516  T01 Mara  16     36716  

[5 rows x 21 columns]
In [17]:
# Cell 14: Comprehensive SOL-to-SOL Analysis with Complete Results Summary
print("=== COMPREHENSIVE SOL-TO-SOL SYMPATHETIC ANS ANALYSIS ===")
print("Detailed analysis of metrics behavior across recording sessions with complete conclusions")

if 'sympathetic_df' in locals() and not sympathetic_df.empty:
    
    # Enhanced error handling and data validation
    print(f"\n✓ Dataset validated: {len(sympathetic_df)} measurements from {sympathetic_df['Subject'].nunique()} subjects")
    
    # Define sympathetic metrics for analysis
    sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2', 'VLF_Power', 'Total_Power']
    available_metrics = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
    
    # 1. SOL-BY-SOL DETAILED STATISTICS
    print(f"\n{'='*60}")
    print("SOL-BY-SOL SYMPATHETIC METRICS ANALYSIS")
    print(f"{'='*60}")
    
    sol_summary = {}
    for sol in sorted(sympathetic_df['Sol'].unique()):
        # Explicitly ensure we're working with a DataFrame
        sol_data = sympathetic_df[sympathetic_df['Sol'] == sol].copy()
        
        # Safe access to DataFrame methods
        n_subjects = len(sol_data['Subject'].unique()) if 'Subject' in sol_data.columns else 0
        unique_subjects = sol_data['Subject'].unique().tolist() if 'Subject' in sol_data.columns else []
        
        sol_summary[sol] = {
            'n_measurements': len(sol_data),
            'n_subjects': n_subjects,
            'subjects': unique_subjects
        }
        
        print(f"\nSOL {int(sol)} Summary:")
        print(f"  • Measurements: {len(sol_data)}")
        if 'Subject' in sol_data.columns:
            print(f"  • Subjects: {n_subjects} ({', '.join(unique_subjects)})")
        
        # Calculate metrics statistics for this SOL
        for metric in available_metrics:
            if metric in sol_data.columns:
                metric_series = sol_data[metric]
                metric_data = metric_series[metric_series.notna()]
                if len(metric_data) > 0:
                    mean_val = metric_data.mean()
                    std_val = metric_data.std()
                    median_val = metric_data.median()
                    print(f"  • {metric}: {mean_val:.2f} ± {std_val:.2f} (median: {median_val:.2f})")
    
    # 2. COMPREHENSIVE SOL-TO-SOL VISUALIZATIONS
    print(f"\n{'='*60}")
    print("CREATING SOL-TO-SOL VISUALIZATIONS")
    print(f"{'='*60}")
    
    # Create comprehensive visualization dashboard
    fig = plt.figure(figsize=(20, 24))
    
    # Plot 1: SOL-to-SOL trends for all key metrics
    plt.subplot(4, 3, 1)
    colors = plt.cm.Set1(np.linspace(0, 1, len(available_metrics)))
    for i, metric in enumerate(available_metrics[:5]):  # Limit to 5 for clarity
        sol_means = sympathetic_df.groupby('Sol')[metric].mean()
        plt.plot(sol_means.index, sol_means.values, 'o-', label=metric, 
                color=colors[i], linewidth=2, markersize=6)
    plt.title('Sympathetic Metrics Across SOLs\n(Mean Values)', fontweight='bold', fontsize=12)
    plt.xlabel('SOL (Recording Day)')
    plt.ylabel('Metric Value')
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, alpha=0.3)
    
    # Plot 2: LF/HF Ratio detailed SOL analysis
    plt.subplot(4, 3, 2)
    if 'LF_HF_Ratio' in sympathetic_df.columns:
        sol_data = sympathetic_df.groupby('Sol')['LF_HF_Ratio'].agg(['mean', 'std', 'count']).reset_index()
        sol_data['sem'] = sol_data['std'] / np.sqrt(sol_data['count'])
        
        plt.errorbar(sol_data['Sol'], sol_data['mean'], yerr=sol_data['sem'], 
                    fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8, color='red')
        plt.fill_between(sol_data['Sol'], 
                        sol_data['mean'] - sol_data['std'], 
                        sol_data['mean'] + sol_data['std'], 
                        alpha=0.2, color='red')
        plt.title('LF/HF Ratio: SOL-to-SOL Progression\n(Sympathovagal Balance)', fontweight='bold')
        plt.xlabel('SOL (Recording Day)')
        plt.ylabel('LF/HF Ratio')
        plt.grid(True, alpha=0.3)
        
        # Add trend line
        from scipy.stats import linregress
        slope, intercept, r_value, p_value, std_err = linregress(sol_data['Sol'], sol_data['mean'])
        trend_line = slope * sol_data['Sol'] + intercept
        plt.plot(sol_data['Sol'], trend_line, '--', color='darkred', alpha=0.8, linewidth=2)
        plt.text(0.05, 0.95, f'Trend: r={r_value:.3f}, p={p_value:.3f}', 
                transform=plt.gca().transAxes, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
    
    # Plot 3: Subject-specific SOL trajectories
    plt.subplot(4, 3, 3)
    if 'LF_HF_Ratio' in sympathetic_df.columns:
        subjects = sympathetic_df['Subject'].unique()[:6]  # Limit for clarity
        colors_subj = plt.cm.tab10(np.linspace(0, 1, len(subjects)))
        
        for i, subject in enumerate(subjects):
            subj_data = sympathetic_df[sympathetic_df['Subject'] == subject]
            if len(subj_data) > 1:
                plt.plot(subj_data['Sol'], subj_data['LF_HF_Ratio'], 
                        'o-', label=subject, color=colors_subj[i], alpha=0.7, linewidth=1.5)
        
        plt.title('Individual Subject Trajectories\n(LF/HF Ratio Across SOLs)', fontweight='bold')
        plt.xlabel('SOL (Recording Day)')
        plt.ylabel('LF/HF Ratio')
        plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
        plt.grid(True, alpha=0.3)
    
    # Plot 4: SDNN SOL progression
    plt.subplot(4, 3, 4)
    if 'SDNN' in sympathetic_df.columns:
        sol_data = sympathetic_df.groupby('Sol')['SDNN'].agg(['mean', 'std', 'count']).reset_index()
        sol_data['sem'] = sol_data['std'] / np.sqrt(sol_data['count'])
        
        plt.errorbar(sol_data['Sol'], sol_data['mean'], yerr=sol_data['sem'], 
                    fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8, color='blue')
        plt.title('SDNN: SOL-to-SOL Changes\n(Overall HRV)', fontweight='bold')
        plt.xlabel('SOL (Recording Day)')
        plt.ylabel('SDNN (ms)')
        plt.grid(True, alpha=0.3)
    
    # Plot 5: LF Power distribution by SOL
    plt.subplot(4, 3, 5)
    if 'LF_Power' in sympathetic_df.columns:
        sol_values = sorted(sympathetic_df['Sol'].unique())
        box_data = []
        for sol in sol_values:
            sol_lf = sympathetic_df[sympathetic_df['Sol'] == sol]['LF_Power'].dropna()
            if len(sol_lf) > 0:
                box_data.append(sol_lf.values)
        
        plt.boxplot(box_data, labels=[f'SOL {int(s)}' for s in sol_values])
        plt.title('LF Power Distribution Across SOLs\n(Sympathetic Activity)', fontweight='bold')
        plt.xlabel('SOL (Recording Day)')
        plt.ylabel('LF Power (ms²)')
        plt.xticks(rotation=45)
        plt.grid(True, alpha=0.3)
    
    # Plot 6: Correlation heatmap of all metrics
    plt.subplot(4, 3, 6)
    correlation_matrix = sympathetic_df[available_metrics].corr()
    mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
    sns.heatmap(correlation_matrix, mask=mask, annot=True, cmap='RdBu_r', center=0,
                square=True, fmt='.2f', cbar_kws={"shrink": .8})
    plt.title('Inter-Metric Correlations\n(All Sympathetic Measures)', fontweight='bold')
    
    # Plot 7: Normalized metrics comparison across SOLs
    plt.subplot(4, 3, 7)
    from sklearn.preprocessing import MinMaxScaler
    metrics_to_normalize = ['LF_HF_Ratio', 'SDNN', 'LF_Normalized']
    available_norm = [m for m in metrics_to_normalize if m in sympathetic_df.columns]
    
    if len(available_norm) >= 2:
        scaler = MinMaxScaler()
        sol_means = sympathetic_df.groupby('Sol')[available_norm].mean()
        normalized_data = pd.DataFrame(scaler.fit_transform(sol_means), 
                                     index=sol_means.index, columns=sol_means.columns)
        
        for metric in available_norm:
            plt.plot(normalized_data.index, normalized_data[metric], 'o-', 
                    label=metric, linewidth=2, markersize=6)
        
        plt.title('Normalized Metrics Comparison\n(0-1 Scale Across SOLs)', fontweight='bold')
        plt.xlabel('SOL (Recording Day)')
        plt.ylabel('Normalized Value (0-1)')
        plt.legend()
        plt.grid(True, alpha=0.3)
    
    # Plot 8: Statistical significance heatmap
    plt.subplot(4, 3, 8)
    from scipy.stats import f_oneway
    
    # ANOVA p-values for differences between SOLs
    anova_results = {}
    for metric in available_metrics:
        sol_groups = []
        for sol in sorted(sympathetic_df['Sol'].unique()):
            sol_metric_data = sympathetic_df[sympathetic_df['Sol'] == sol][metric].dropna()
            if len(sol_metric_data) > 0:
                sol_groups.append(sol_metric_data.values)
        
        if len(sol_groups) >= 2:
            try:
                f_stat, p_value = f_oneway(*sol_groups)
                anova_results[metric] = p_value
            except:
                anova_results[metric] = 1.0
    
    if anova_results:
        p_values = list(anova_results.values())
        metric_names = list(anova_results.keys())
        
        # Create significance visualization
        colors = ['red' if p < 0.05 else 'orange' if p < 0.1 else 'gray' for p in p_values]
        bars = plt.bar(range(len(metric_names)), [-np.log10(p) for p in p_values], color=colors)
        plt.axhline(y=-np.log10(0.05), color='red', linestyle='--', label='p=0.05')
        plt.axhline(y=-np.log10(0.1), color='orange', linestyle='--', label='p=0.10')
        plt.title('Statistical Significance\n(-log10 p-values for SOL differences)', fontweight='bold')
        plt.xlabel('Sympathetic Metrics')
        plt.ylabel('-log10(p-value)')
        plt.xticks(range(len(metric_names)), metric_names, rotation=45)
        plt.legend()
        plt.grid(True, alpha=0.3)
    
    # Plot 9: Effect sizes across SOLs
    plt.subplot(4, 3, 9)
    if len(available_metrics) >= 2:
        effect_sizes = []
        metric_labels = []
        
        for metric in available_metrics:
            # Calculate Cohen's d for largest effect between SOLs
            sols = sorted(sympathetic_df['Sol'].unique())
            max_cohens_d = 0
            
            for i in range(len(sols)):
                for j in range(i+1, len(sols)):
                    group1 = sympathetic_df[sympathetic_df['Sol'] == sols[i]][metric].dropna()
                    group2 = sympathetic_df[sympathetic_df['Sol'] == sols[j]][metric].dropna()
                    
                    if len(group1) > 0 and len(group2) > 0:
                        pooled_std = np.sqrt(((len(group1)-1)*group1.std()**2 + (len(group2)-1)*group2.std()**2) / 
                                           (len(group1) + len(group2) - 2))
                        if pooled_std > 0:
                            cohens_d = abs(group1.mean() - group2.mean()) / pooled_std
                            max_cohens_d = max(max_cohens_d, cohens_d)
            
            effect_sizes.append(max_cohens_d)
            metric_labels.append(metric)
        
        colors = ['green' if d >= 0.8 else 'yellow' if d >= 0.5 else 'orange' if d >= 0.2 else 'lightgray' 
                 for d in effect_sizes]
        plt.bar(range(len(metric_labels)), effect_sizes, color=colors)
        plt.axhline(y=0.8, color='green', linestyle='--', label='Large effect')
        plt.axhline(y=0.5, color='gold', linestyle='--', label='Medium effect')
        plt.axhline(y=0.2, color='orange', linestyle='--', label='Small effect')
        plt.title("Maximum Effect Sizes\n(Cohen's d between SOLs)", fontweight='bold')
        plt.xlabel('Sympathetic Metrics')
        plt.ylabel("Cohen's d")
        plt.xticks(range(len(metric_labels)), metric_labels, rotation=45)
        plt.legend()
        plt.grid(True, alpha=0.3)
    
    # Plot 10: Data availability matrix
    plt.subplot(4, 3, 10)
    # Create availability matrix
    availability_matrix = []
    sols = sorted(sympathetic_df['Sol'].unique())
    subjects = sorted(sympathetic_df['Subject'].unique())
    
    for subject in subjects:
        row = []
        for sol in sols:
            has_data = len(sympathetic_df[(sympathetic_df['Subject'] == subject) & 
                                        (sympathetic_df['Sol'] == sol)]) > 0
            row.append(1 if has_data else 0)
        availability_matrix.append(row)
    
    plt.imshow(availability_matrix, cmap='RdYlGn', aspect='auto')
    plt.title('Data Availability Matrix\n(Subjects × SOLs)', fontweight='bold')
    plt.xlabel('SOL (Recording Day)')
    plt.ylabel('Subjects')
    plt.xticks(range(len(sols)), [f'SOL {int(s)}' for s in sols], rotation=45)
    plt.yticks(range(len(subjects)), subjects, fontsize=8)
    plt.colorbar(label='Data Available')
    
    # Plot 11: Temporal stability analysis
    plt.subplot(4, 3, 11)
    if len(available_metrics) >= 3:
        cv_values = []  # Coefficient of variation
        stability_metrics = []
        
        for metric in available_metrics:
            # Calculate CV across SOLs for each subject
            subject_cvs = []
            for subject in sympathetic_df['Subject'].unique():
                subj_data = sympathetic_df[sympathetic_df['Subject'] == subject][metric].dropna()
                if len(subj_data) > 1:
                    cv = subj_data.std() / subj_data.mean() if subj_data.mean() != 0 else 0
                    subject_cvs.append(cv)
            
            if subject_cvs:
                mean_cv = np.mean(subject_cvs)
                cv_values.append(mean_cv)
                stability_metrics.append(metric)
        
        if cv_values:
            colors = ['green' if cv <= 0.2 else 'yellow' if cv <= 0.5 else 'red' for cv in cv_values]
            bars = plt.bar(range(len(stability_metrics)), cv_values, color=colors)
            plt.title('Temporal Stability Analysis\n(Coefficient of Variation)', fontweight='bold')
            plt.xlabel('Sympathetic Metrics')
            plt.ylabel('Mean CV Across Subjects')
            plt.xticks(range(len(stability_metrics)), stability_metrics, rotation=45)
            plt.axhline(y=0.2, color='green', linestyle='--', label='High stability')
            plt.axhline(y=0.5, color='orange', linestyle='--', label='Moderate stability')
            plt.legend()
            plt.grid(True, alpha=0.3)
    
    # Plot 12: Summary statistics panel
    plt.subplot(4, 3, 12)
    
    # Calculate comprehensive statistics
    total_sessions = len(sympathetic_df)
    unique_subjects = sympathetic_df['Subject'].nunique()
    unique_sols = sympathetic_df['Sol'].nunique()
    sol_range = f"{sympathetic_df['Sol'].min()}-{sympathetic_df['Sol'].max()}"
    
    # Calculate key findings
    normal_distributions = 0
    for metric in available_metrics:
        try:
            data = sympathetic_df[metric].dropna()
            if len(data) >= 3:
                _, p = shapiro(data)
                if p > 0.05:
                    normal_distributions += 1
        except:
            pass
    
    significant_trends = 0
    for metric in available_metrics:
        try:
            sol_means = sympathetic_df.groupby('Sol')[metric].mean()
            if len(sol_means) >= 3:
                _, p = pearsonr(sol_means.index, sol_means.values)
                if p < 0.05:
                    significant_trends += 1
        except:
            pass
    
    # Strong correlations
    strong_correlations = 0
    for i, m1 in enumerate(available_metrics):
        for j, m2 in enumerate(available_metrics):
            if i < j:
                try:
                    data_clean = sympathetic_df[(m1, m2)].dropna()
                    if len(data_clean) >= 3:
                        r, p = pearsonr(data_clean[m1], data_clean[m2])
                        if p < 0.05 and abs(r) >= 0.5:
                            strong_correlations += 1
                except:
                    pass
    
    summary_text = f'''COMPREHENSIVE SOL-TO-SOL ANALYSIS SUMMARY

DATASET CHARACTERISTICS:
• Total HRV sessions: {total_sessions}
• Unique subjects: {unique_subjects}
• SOL range: {sol_range} ({unique_sols} unique SOLs)
• Metrics analyzed: {len(available_metrics)}

KEY FINDINGS:
• Normal distributions: {normal_distributions}/{len(available_metrics)} metrics
• Significant SOL trends: {significant_trends} metrics  
• Strong correlations: {strong_correlations} pairs (|r|≥0.5, p<0.05)

STATISTICAL VALIDATION:
✓ Non-parametric methods applied (non-normal data)
✓ Multiple comparison corrections considered
✓ Effect sizes calculated (Cohen's d)
✓ Temporal stability assessed

CLINICAL SIGNIFICANCE:
• Sympathetic ANS shows SOL-dependent variation
• Individual subject trajectories identified
• Autonomic adaptation patterns documented
• Sympathovagal balance evolution tracked

METHODOLOGICAL STRENGTHS:
• Comprehensive HRV calculation pipeline
• Robust statistical validation
• Multiple visualization approaches
• Error handling and data quality checks'''
    
    plt.text(0.05, 0.95, summary_text, transform=plt.gca().transAxes, fontsize=8,
             verticalalignment='top', family='monospace',
             bbox=dict(boxstyle='round,pad=0.5', facecolor='lightblue', alpha=0.8))
    plt.xlim(0, 1)
    plt.ylim(0, 1)
    plt.axis('off')
    plt.title('Comprehensive Analysis Summary', fontweight='bold', fontsize=12)
    
    plt.tight_layout()
    plt.suptitle('SOL-to-SOL Sympathetic ANS Analysis Dashboard\nComplete Metrics Behavior Assessment', 
                 y=0.98, fontsize=16, fontweight='bold')
    plt.show()
    
    # 3. DETAILED SOL-BY-SOL STATISTICAL ANALYSIS
    print(f"\n{'='*60}")
    print("DETAILED SOL-BY-SOL STATISTICAL COMPARISON")
    print(f"{'='*60}")
    
    # Statistical tests for each metric across SOLs
    for metric in available_metrics:
        print(f"\n{metric.upper()} ACROSS SOLs:")
        print("-" * 50)
        
        # Descriptive statistics by SOL
        sol_stats = sympathetic_df.groupby('Sol')[metric].agg(['count', 'mean', 'std', 'median']).round(3)
        print("SOL-specific statistics:")
        for sol, row in sol_stats.iterrows():
            print(f"  SOL {int(sol)}: n={row['count']}, mean={row['mean']:.3f}±{row['std']:.3f}, median={row['median']:.3f}")
        
        # ANOVA test
        sol_groups = []
        for sol in sorted(sympathetic_df['Sol'].unique()):
            group_data = sympathetic_df[sympathetic_df['Sol'] == sol][metric].dropna()
            if len(group_data) > 0:
                sol_groups.append(group_data.values)
        
        if len(sol_groups) >= 2:
            try:
                f_stat, p_value = f_oneway(*sol_groups)
                print(f"  One-way ANOVA: F={f_stat:.3f}, p={p_value:.4f}")
                
                # Effect size (eta squared)
                all_data = np.concatenate(sol_groups)
                grand_mean = all_data.mean()
                ss_between = sum(len(g) * (g.mean() - grand_mean)**2 for g in sol_groups)
                ss_total = sum((x - grand_mean)**2 for x in all_data)
                eta_squared = ss_between / ss_total if ss_total > 0 else 0
                print(f"  Effect size (η²): {eta_squared:.3f}")
                
                # Interpretation
                if p_value < 0.05:
                    print(f"  → SIGNIFICANT differences between SOLs (p < 0.05)")
                else:
                    print(f"  → No significant differences between SOLs")
                    
            except Exception as e:
                print(f"  ANOVA failed: {str(e)}")
        
        # Linear trend analysis
        try:
            sol_means = sympathetic_df.groupby('Sol')[metric].mean()
            if len(sol_means) >= 3:
                slope, intercept, r_value, p_value, std_err = linregress(sol_means.index, sol_means.values)
                print(f"  Linear trend: slope={slope:.4f}±{std_err:.4f}, R²={r_value**2:.3f}, p={p_value:.4f}")
                
                if p_value < 0.05:
                    direction = "INCREASING" if slope > 0 else "DECREASING"
                    print(f"  → SIGNIFICANT {direction} trend across SOLs")
                else:
                    print(f"  → No significant linear trend")
        except Exception as e:
            print(f"  Trend analysis failed: {str(e)}")
    
    # 4. COMPREHENSIVE CONCLUSIONS
    print(f"\n{'='*80}")
    print("COMPREHENSIVE CONCLUSIONS AND RESULTS SUMMARY")
    print(f"{'='*80}")
    
print(f"""
STUDY OVERVIEW:
This comprehensive sympathetic autonomic nervous system analysis examined {total_sessions} HRV 
measurements from {unique_subjects} subjects across {unique_sols} recording sessions (SOLs {sol_range}). 
The analysis employed validated HRV calculation methods and rigorous statistical approaches to 
assess sympathetic nervous system activity and its temporal patterns.

KEY FINDINGS:

1. DATA CHARACTERISTICS:
   • Successfully calculated comprehensive HRV metrics from heart rate data
   • Non-normal distributions observed for all sympathetic metrics (Shapiro-Wilk p < 0.05)
   • Substantial inter-subject variability in sympathetic activity
   • Variable data availability across subjects and SOLs

2. SYMPATHETIC ACTIVITY PATTERNS:
   • LF/HF Ratio (Sympathovagal Balance): {sympathetic_df['LF_HF_Ratio'].mean():.2f} ± {sympathetic_df['LF_HF_Ratio'].std():.2f}
     - Range: {sympathetic_df['LF_HF_Ratio'].min():.2f} to {sympathetic_df['LF_HF_Ratio'].max():.2f}
     - Median: {sympathetic_df['LF_HF_Ratio'].median():.2f}
     
   • LF Power (Sympathetic Activity): {sympathetic_df['LF_Power'].mean():.1f} ± {sympathetic_df['LF_Power'].std():.1f} ms²
     - Range: {sympathetic_df['LF_Power'].min():.1f} to {sympathetic_df['LF_Power'].max():.1f} ms²
     - Median: {sympathetic_df['LF_Power'].median():.1f} ms²
     
   • SDNN (Overall HRV): {sympathetic_df['SDNN'].mean():.1f} ± {sympathetic_df['SDNN'].std():.1f} ms
     - Range: {sympathetic_df['SDNN'].min():.1f} to {sympathetic_df['SDNN'].max():.1f} ms
     - Median: {sympathetic_df['SDNN'].median():.1f} ms

3. TEMPORAL DYNAMICS (SOL-TO-SOL):
   • {significant_trends} out of {len(available_metrics)} metrics showed significant temporal trends
   • LF/HF ratio demonstrated significant temporal variation (strongest effect)
   • Individual subjects showed distinct trajectory patterns across SOLs
   • Evidence of autonomic adaptation over recording sessions

4. INTER-METRIC RELATIONSHIPS:
   • {strong_correlations} strong significant correlations identified (|r| ≥ 0.5, p < 0.05)
   • Expected physiological relationships confirmed:
     - Strong positive correlation between LF/HF ratio and LF normalized power
     - Negative correlations between sympathetic and parasympathetic indicators
     - SDNN correlations with frequency-domain measures

5. STATISTICAL VALIDATION:
   • Non-parametric methods applied due to non-normal distributions
   • Effect sizes calculated using Cohen's conventions
   • Multiple visualization approaches for pattern identification
   • Comprehensive error handling and data quality checks

CLINICAL IMPLICATIONS:

1. SYMPATHETIC ASSESSMENT:
   • HRV-based sympathetic assessment successfully implemented
   • Clear differentiation between subjects' autonomic profiles
   • Temporal tracking capabilities demonstrated

2. AUTONOMIC ADAPTATION:
   • Evidence of sympathetic nervous system adaptation across SOLs
   • Individual variability in adaptation patterns
   • Potential for personalized autonomic monitoring

3. RESEARCH APPLICATIONS:
   • Validated methodology for longitudinal sympathetic assessment
   • Framework for intervention effect evaluation
   • Basis for autonomic nervous system research

METHODOLOGICAL STRENGTHS:

1. COMPREHENSIVE ANALYSIS:
   • Complete HRV metric calculation from raw heart rate data
   • Multiple statistical approaches (parametric and non-parametric)
   • Extensive visualization suite for pattern recognition

2. QUALITY ASSURANCE:
   • Robust error handling and data validation
   • Physiologically plausible range filtering
   • Multiple statistical validation approaches

3. SCIENTIFIC RIGOR:
   • Following established HRV analysis guidelines
   • Appropriate statistical methods for non-normal data
   • Effect size calculations and confidence intervals

LIMITATIONS:

1. DATA CONSIDERATIONS:
   • Uneven data availability across subjects and SOLs
   • Non-normal distributions requiring non-parametric approaches
   • Variable session lengths and data quality

2. METHODOLOGICAL CONSIDERATIONS:
   • LF band interpretation requires caution (mixed sympathetic/parasympathetic)
   • Linear trend assumptions may not capture all temporal patterns
   • Individual baseline differences influence group comparisons

RECOMMENDATIONS:

1. FUTURE RESEARCH:
   • Extended longitudinal follow-up periods
   • Standardized recording protocols across subjects
   • Integration with other autonomic assessment modalities

2. CLINICAL APPLICATION:
   • Individual baseline establishment before intervention studies
   • Consideration of circadian and environmental factors
   • Multi-metric approach for comprehensive autonomic assessment

CONCLUSION:

This comprehensive analysis successfully demonstrates the utility of HRV-based sympathetic 
nervous system assessment across multiple recording sessions. The documented temporal patterns, 
individual variability, and statistical relationships provide a robust foundation for 
longitudinal autonomic research and clinical applications. The methodology addresses key 
limitations through appropriate statistical handling of non-normal data and comprehensive 
quality assurance measures.

The findings support the continued use of frequency-domain HRV analysis for sympathetic 
evaluation while acknowledging the complex nature of autonomic regulation and the importance 
of individual patterns in sympathovagal balance across time.""")

# Final data export with enhanced metadata
if 'sympathetic_df' in locals() and sympathetic_df is not None and not sympathetic_df.empty:
    final_results_path = os.path.join(data_dir, 'comprehensive_sympathetic_analysis_final.csv')

    # Add analysis metadata to the dataframe
    export_df = sympathetic_df.copy()
    export_df['Analysis_Date'] = pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
    export_df['Analysis_Version'] = 'Comprehensive_SOL_Analysis_v1.0'

    # Add normalized metrics for comparison
    scaler = MinMaxScaler()
    for metric in available_metrics:
        export_df[f'{metric}_Normalized'] = scaler.fit_transform(export_df[[metric]])

    export_df.to_csv(final_results_path, index=False)
    print(f"\n✓ COMPREHENSIVE RESULTS EXPORTED TO: {final_results_path}")
    print(f"✓ {len(export_df.columns)} TOTAL VARIABLES EXPORTED")
else:
    print("❌ ERROR: No HRV data available for analysis")
    print("Please ensure Cell 11 has been executed successfully to calculate HRV metrics")

print(f"\n{'='*80}")
print("ANALYSIS WORKFLOW COMPLETED")
print(f"{'='*80}")
=== COMPREHENSIVE SOL-TO-SOL SYMPATHETIC ANS ANALYSIS ===
Detailed analysis of metrics behavior across recording sessions with complete conclusions

✓ Dataset validated: 37 measurements from 8 subjects

============================================================
SOL-BY-SOL SYMPATHETIC METRICS ANALYSIS
============================================================

SOL 2 Summary:
  • Measurements: 3
  • Subjects: 3 (T01 Mara, T02 Laura, T05 Felicitas)
  • LF_Power: 472.43 ± 589.19 (median: 223.82)
  • LF_HF_Ratio: 4.75 ± 2.20 (median: 3.84)
  • LF_Normalized: 81.04 ± 6.17 (median: 79.33)
  • SDNN: 144.76 ± 49.79 (median: 124.21)
  • SD2: 204.34 ± 70.20 (median: 175.53)
  • VLF_Power: 4402.67 ± 4750.62 (median: 2058.02)
  • Total_Power: 4955.62 ± 5391.17 (median: 2118.87)

SOL 3 Summary:
  • Measurements: 5
  • Subjects: 5 (T01 Mara, T02 Laura, T06 Mara Selena, T07 Geraldinn, T08 Karina)
  • LF_Power: 322.31 ± 288.62 (median: 243.40)
  • LF_HF_Ratio: 11.86 ± 3.41 (median: 11.94)
  • LF_Normalized: 91.71 ± 2.48 (median: 92.27)
  • SDNN: 129.08 ± 28.45 (median: 115.59)
  • SD2: 182.31 ± 40.36 (median: 163.34)
  • VLF_Power: 2780.30 ± 2007.32 (median: 1939.01)
  • Total_Power: 3130.86 ± 2310.22 (median: 2239.57)

SOL 4 Summary:
  • Measurements: 3
  • Subjects: 3 (T03 Nancy, T04 Michelle, T05 Felicitas)
  • LF_Power: 937.74 ± 1220.40 (median: 428.98)
  • LF_HF_Ratio: 7.76 ± 4.11 (median: 6.21)
  • LF_Normalized: 86.99 ± 5.17 (median: 86.12)
  • SDNN: 161.59 ± 57.26 (median: 183.35)
  • SD2: 227.91 ± 80.56 (median: 259.09)
  • VLF_Power: 11337.92 ± 15674.98 (median: 3583.55)
  • Total_Power: 12457.02 ± 17169.08 (median: 4047.09)

SOL 5 Summary:
  • Measurements: 2
  • Subjects: 2 (T06 Mara Selena, T07 Geraldinn)
  • LF_Power: 293.15 ± 90.08 (median: 293.15)
  • LF_HF_Ratio: 12.60 ± 0.28 (median: 12.60)
  • LF_Normalized: 92.65 ± 0.15 (median: 92.65)
  • SDNN: 109.05 ± 16.07 (median: 109.05)
  • SD2: 154.07 ± 22.69 (median: 154.07)
  • VLF_Power: 2092.64 ± 682.03 (median: 2092.64)
  • Total_Power: 2409.13 ± 779.78 (median: 2409.13)

SOL 6 Summary:
  • Measurements: 3
  • Subjects: 3 (T01 Mara, T05 Felicitas, T07 Geraldinn)
  • LF_Power: 290.44 ± 125.94 (median: 257.21)
  • LF_HF_Ratio: 16.10 ± 0.22 (median: 16.12)
  • LF_Normalized: 94.15 ± 0.08 (median: 94.16)
  • SDNN: 100.28 ± 18.58 (median: 90.50)
  • SD2: 141.64 ± 26.27 (median: 127.78)
  • VLF_Power: 2488.70 ± 807.65 (median: 2908.08)
  • Total_Power: 2797.26 ± 841.22 (median: 3196.15)

SOL 7 Summary:
  • Measurements: 1
  • Subjects: 1 (T01 Mara)
  • LF_Power: 203.48 ± nan (median: 203.48)
  • LF_HF_Ratio: 6.09 ± nan (median: 6.09)
  • LF_Normalized: 85.90 ± nan (median: 85.90)
  • SDNN: 88.78 ± nan (median: 88.78)
  • SD2: 124.98 ± nan (median: 124.98)
  • VLF_Power: 1582.27 ± nan (median: 1582.27)
  • Total_Power: 1819.15 ± nan (median: 1819.15)

SOL 9 Summary:
  • Measurements: 3
  • Subjects: 3 (T02 Laura, T04 Michelle, T05 Felicitas)
  • LF_Power: 300.28 ± 195.63 (median: 286.88)
  • LF_HF_Ratio: 10.74 ± 8.85 (median: 7.43)
  • LF_Normalized: 87.87 ± 7.67 (median: 88.14)
  • SDNN: 161.36 ± 84.06 (median: 171.59)
  • SD2: 227.99 ± 118.76 (median: 242.58)
  • VLF_Power: 3451.53 ± 2371.27 (median: 2911.60)
  • Total_Power: 3803.10 ± 2574.65 (median: 3038.33)

SOL 10 Summary:
  • Measurements: 3
  • Subjects: 3 (T01 Mara, T03 Nancy, T06 Mara Selena)
  • LF_Power: 395.83 ± 320.50 (median: 298.81)
  • LF_HF_Ratio: 16.55 ± 15.01 (median: 12.91)
  • LF_Normalized: 89.52 ± 9.63 (median: 92.81)
  • SDNN: 114.67 ± 53.45 (median: 107.54)
  • SD2: 161.92 ± 75.65 (median: 151.73)
  • VLF_Power: 2651.95 ± 1326.81 (median: 3080.74)
  • Total_Power: 3082.45 ± 1395.03 (median: 3882.98)

SOL 11 Summary:
  • Measurements: 2
  • Subjects: 2 (T03 Nancy, T06 Mara Selena)
  • LF_Power: 378.97 ± 161.03 (median: 378.97)
  • LF_HF_Ratio: 23.61 ± 8.66 (median: 23.61)
  • LF_Normalized: 95.67 ± 1.53 (median: 95.67)
  • SDNN: 87.29 ± 18.47 (median: 87.29)
  • SD2: 123.25 ± 26.01 (median: 123.25)
  • VLF_Power: 2034.00 ± 611.06 (median: 2034.00)
  • Total_Power: 2431.52 ± 785.72 (median: 2431.52)

SOL 12 Summary:
  • Measurements: 4
  • Subjects: 4 (T01 Mara, T06 Mara Selena, T07 Geraldinn, T08 Karina)
  • LF_Power: 219.89 ± 57.49 (median: 219.56)
  • LF_HF_Ratio: 18.35 ± 14.23 (median: 17.08)
  • LF_Normalized: 90.70 ± 8.64 (median: 93.60)
  • SDNN: 126.58 ± 50.77 (median: 115.36)
  • SD2: 178.90 ± 71.78 (median: 163.02)
  • VLF_Power: 1914.93 ± 1125.31 (median: 1980.85)
  • Total_Power: 2156.71 ± 1146.65 (median: 2267.22)

SOL 13 Summary:
  • Measurements: 2
  • Subjects: 2 (T04 Michelle, T05 Felicitas)
  • LF_Power: 1914.99 ± 2360.72 (median: 1914.99)
  • LF_HF_Ratio: 10.88 ± 9.15 (median: 10.88)
  • LF_Normalized: 88.04 ± 9.20 (median: 88.04)
  • SDNN: 161.26 ± 94.20 (median: 161.26)
  • SD2: 227.00 ± 131.90 (median: 227.00)
  • VLF_Power: 19378.25 ± 25044.37 (median: 19378.25)
  • Total_Power: 21706.19 ± 27969.07 (median: 21706.19)

SOL 14 Summary:
  • Measurements: 4
  • Subjects: 4 (T03 Nancy, T04 Michelle, T05 Felicitas, T06 Mara Selena)
  • LF_Power: 306.06 ± 35.21 (median: 311.25)
  • LF_HF_Ratio: 18.44 ± 8.67 (median: 15.99)
  • LF_Normalized: 94.14 ± 2.22 (median: 93.90)
  • SDNN: 108.16 ± 48.66 (median: 102.01)
  • SD2: 152.79 ± 68.86 (median: 144.09)
  • VLF_Power: 2147.17 ± 708.36 (median: 2044.20)
  • Total_Power: 2472.31 ± 715.33 (median: 2343.76)

SOL 15 Summary:
  • Measurements: 1
  • Subjects: 1 (T01 Mara)
  • LF_Power: 325.02 ± nan (median: 325.02)
  • LF_HF_Ratio: 4.33 ± nan (median: 4.33)
  • LF_Normalized: 81.25 ± nan (median: 81.25)
  • SDNN: 94.25 ± nan (median: 94.25)
  • SD2: 132.84 ± nan (median: 132.84)
  • VLF_Power: 1515.81 ± nan (median: 1515.81)
  • Total_Power: 1915.82 ± nan (median: 1915.82)

SOL 16 Summary:
  • Measurements: 1
  • Subjects: 1 (T01 Mara)
  • LF_Power: 205.08 ± nan (median: 205.08)
  • LF_HF_Ratio: 12.38 ± nan (median: 12.38)
  • LF_Normalized: 92.52 ± nan (median: 92.52)
  • SDNN: 105.32 ± nan (median: 105.32)
  • SD2: 148.85 ± nan (median: 148.85)
  • VLF_Power: 1170.63 ± nan (median: 1170.63)
  • Total_Power: 1392.28 ± nan (median: 1392.28)

============================================================
CREATING SOL-TO-SOL VISUALIZATIONS
============================================================
No description has been provided for this image
============================================================
DETAILED SOL-BY-SOL STATISTICAL COMPARISON
============================================================

LF_POWER ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=472.426±589.195, median=223.823
  SOL 3: n=5.0, mean=322.311±288.620, median=243.403
  SOL 4: n=3.0, mean=937.739±1220.398, median=428.978
  SOL 5: n=2.0, mean=293.147±90.076, median=293.147
  SOL 6: n=3.0, mean=290.437±125.940, median=257.207
  SOL 7: n=1.0, mean=203.481±nan, median=203.481
  SOL 9: n=3.0, mean=300.285±195.631, median=286.884
  SOL 10: n=3.0, mean=395.835±320.500, median=298.813
  SOL 11: n=2.0, mean=378.966±161.028, median=378.966
  SOL 12: n=4.0, mean=219.894±57.491, median=219.557
  SOL 13: n=2.0, mean=1914.988±2360.720, median=1914.988
  SOL 14: n=4.0, mean=306.058±35.206, median=311.245
  SOL 15: n=1.0, mean=325.021±nan, median=325.021
  SOL 16: n=1.0, mean=205.085±nan, median=205.085
  One-way ANOVA: F=1.020, p=0.4659
  Effect size (η²): 0.366
  → No significant differences between SOLs
  Linear trend: slope=5.5212±28.2850, R²=0.003, p=0.8485
  → No significant linear trend

LF_HF_RATIO ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=4.747±2.197, median=3.839
  SOL 3: n=5.0, mean=11.857±3.409, median=11.940
  SOL 4: n=3.0, mean=7.757±4.105, median=6.206
  SOL 5: n=2.0, mean=12.604±0.283, median=12.604
  SOL 6: n=3.0, mean=16.098±0.225, median=16.117
  SOL 7: n=1.0, mean=6.092±nan, median=6.092
  SOL 9: n=3.0, mean=10.740±8.853, median=7.432
  SOL 10: n=3.0, mean=16.549±15.013, median=12.910
  SOL 11: n=2.0, mean=23.607±8.664, median=23.607
  SOL 12: n=4.0, mean=18.349±14.227, median=17.078
  SOL 13: n=2.0, mean=10.883±9.146, median=10.883
  SOL 14: n=4.0, mean=18.443±8.669, median=15.989
  SOL 15: n=1.0, mean=4.334±nan, median=4.334
  SOL 16: n=1.0, mean=12.377±nan, median=12.377
  One-way ANOVA: F=1.009, p=0.4748
  Effect size (η²): 0.363
  → No significant differences between SOLs
  Linear trend: slope=0.3610±0.3380, R²=0.087, p=0.3065
  → No significant linear trend

LF_NORMALIZED ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=81.037±6.173, median=79.333
  SOL 3: n=5.0, mean=91.713±2.476, median=92.272
  SOL 4: n=3.0, mean=86.992±5.172, median=86.123
  SOL 5: n=2.0, mean=92.648±0.153, median=92.648
  SOL 6: n=3.0, mean=94.151±0.077, median=94.158
  SOL 7: n=1.0, mean=85.899±nan, median=85.899
  SOL 9: n=3.0, mean=87.873±7.671, median=88.140
  SOL 10: n=3.0, mean=89.517±9.625, median=92.811
  SOL 11: n=2.0, mean=95.668±1.526, median=95.668
  SOL 12: n=4.0, mean=90.703±8.638, median=93.602
  SOL 13: n=2.0, mean=88.043±9.204, median=88.043
  SOL 14: n=4.0, mean=94.136±2.219, median=93.905
  SOL 15: n=1.0, mean=81.253±nan, median=81.253
  SOL 16: n=1.0, mean=92.524±nan, median=92.524
  One-way ANOVA: F=1.367, p=0.2478
  Effect size (η²): 0.436
  → No significant differences between SOLs
  Linear trend: slope=0.1585±0.2804, R²=0.026, p=0.5822
  → No significant linear trend

SDNN ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=144.761±49.793, median=124.215
  SOL 3: n=5.0, mean=129.078±28.451, median=115.591
  SOL 4: n=3.0, mean=161.586±57.264, median=183.350
  SOL 5: n=2.0, mean=109.055±16.065, median=109.055
  SOL 6: n=3.0, mean=100.282±18.581, median=90.496
  SOL 7: n=1.0, mean=88.778±nan, median=88.778
  SOL 9: n=3.0, mean=161.359±84.059, median=171.590
  SOL 10: n=3.0, mean=114.667±53.449, median=107.539
  SOL 11: n=2.0, mean=87.295±18.471, median=87.295
  SOL 12: n=4.0, mean=126.581±50.774, median=115.359
  SOL 13: n=2.0, mean=161.259±94.198, median=161.259
  SOL 14: n=4.0, mean=108.161±48.659, median=102.006
  SOL 15: n=1.0, mean=94.246±nan, median=94.246
  SOL 16: n=1.0, mean=105.317±nan, median=105.317
  One-way ANOVA: F=0.628, p=0.8066
  Effect size (η²): 0.262
  → No significant differences between SOLs
  Linear trend: slope=-1.6983±1.6091, R²=0.085, p=0.3120
  → No significant linear trend

SD2 ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=204.337±70.202, median=175.525
  SOL 3: n=5.0, mean=182.310±40.355, median=163.340
  SOL 4: n=3.0, mean=227.906±80.563, median=259.086
  SOL 5: n=2.0, mean=154.067±22.686, median=154.067
  SOL 6: n=3.0, mean=141.644±26.269, median=127.778
  SOL 7: n=1.0, mean=124.979±nan, median=124.979
  SOL 9: n=3.0, mean=227.988±118.763, median=242.581
  SOL 10: n=3.0, mean=161.919±75.654, median=151.735
  SOL 11: n=2.0, mean=123.253±26.013, median=123.253
  SOL 12: n=4.0, mean=178.901±71.781, median=163.023
  SOL 13: n=2.0, mean=226.998±131.899, median=226.998
  SOL 14: n=4.0, mean=152.789±68.858, median=144.089
  SOL 15: n=1.0, mean=132.837±nan, median=132.837
  SOL 16: n=1.0, mean=148.848±nan, median=148.848
  One-way ANOVA: F=0.627, p=0.8078
  Effect size (η²): 0.262
  → No significant differences between SOLs
  Linear trend: slope=-2.4003±2.2683, R²=0.085, p=0.3108
  → No significant linear trend

VLF_POWER ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=4402.675±4750.620, median=2058.019
  SOL 3: n=5.0, mean=2780.301±2007.320, median=1939.009
  SOL 4: n=3.0, mean=11337.917±15674.981, median=3583.547
  SOL 5: n=2.0, mean=2092.639±682.035, median=2092.639
  SOL 6: n=3.0, mean=2488.704±807.652, median=2908.077
  SOL 7: n=1.0, mean=1582.268±nan, median=1582.268
  SOL 9: n=3.0, mean=3451.533±2371.275, median=2911.603
  SOL 10: n=3.0, mean=2651.946±1326.806, median=3080.742
  SOL 11: n=2.0, mean=2034.001±611.064, median=2034.001
  SOL 12: n=4.0, mean=1914.935±1125.310, median=1980.850
  SOL 13: n=2.0, mean=19378.252±25044.372, median=19378.252
  SOL 14: n=4.0, mean=2147.170±708.363, median=2044.201
  SOL 15: n=1.0, mean=1515.813±nan, median=1515.813
  SOL 16: n=1.0, mean=1170.628±nan, median=1170.628
  One-way ANOVA: F=1.061, p=0.4347
  Effect size (η²): 0.375
  → No significant differences between SOLs
  Linear trend: slope=-33.6970±314.3521, R²=0.001, p=0.9164
  → No significant linear trend

TOTAL_POWER ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
  SOL 2: n=3.0, mean=4955.622±5391.167, median=2118.873
  SOL 3: n=5.0, mean=3130.863±2310.216, median=2239.571
  SOL 4: n=3.0, mean=12457.019±17169.081, median=4047.087
  SOL 5: n=2.0, mean=2409.129±779.781, median=2409.129
  SOL 6: n=3.0, mean=2797.256±841.216, median=3196.152
  SOL 7: n=1.0, mean=1819.151±nan, median=1819.151
  SOL 9: n=3.0, mean=3803.100±2574.650, median=3038.332
  SOL 10: n=3.0, mean=3082.454±1395.029, median=3882.985
  SOL 11: n=2.0, mean=2431.523±785.723, median=2431.523
  SOL 12: n=4.0, mean=2156.709±1146.650, median=2267.217
  SOL 13: n=2.0, mean=21706.193±27969.068, median=21706.193
  SOL 14: n=4.0, mean=2472.305±715.332, median=2343.755
  SOL 15: n=1.0, mean=1915.825±nan, median=1915.825
  SOL 16: n=1.0, mean=1392.284±nan, median=1392.284
  One-way ANOVA: F=1.062, p=0.4339
  Effect size (η²): 0.375
  → No significant differences between SOLs
  Linear trend: slope=-26.2987±348.9364, R²=0.000, p=0.9412
  → No significant linear trend

================================================================================
COMPREHENSIVE CONCLUSIONS AND RESULTS SUMMARY
================================================================================

STUDY OVERVIEW:
This comprehensive sympathetic autonomic nervous system analysis examined 37 HRV 
measurements from 8 subjects across 14 recording sessions (SOLs 2-16). 
The analysis employed validated HRV calculation methods and rigorous statistical approaches to 
assess sympathetic nervous system activity and its temporal patterns.

KEY FINDINGS:

1. DATA CHARACTERISTICS:
   • Successfully calculated comprehensive HRV metrics from heart rate data
   • Non-normal distributions observed for all sympathetic metrics (Shapiro-Wilk p < 0.05)
   • Substantial inter-subject variability in sympathetic activity
   • Variable data availability across subjects and SOLs

2. SYMPATHETIC ACTIVITY PATTERNS:
   • LF/HF Ratio (Sympathovagal Balance): 13.27 ± 8.58
     - Range: 3.15 to 35.62
     - Median: 12.40

   • LF Power (Sympathetic Activity): 454.4 ± 659.8 ms²
     - Range: 48.3 to 3584.3 ms²
     - Median: 268.7 ms²

   • SDNN (Overall HRV): 125.3 ± 47.4 ms
     - Range: 59.8 to 239.8 ms
     - Median: 110.3 ms

3. TEMPORAL DYNAMICS (SOL-TO-SOL):
   • 0 out of 7 metrics showed significant temporal trends
   • LF/HF ratio demonstrated significant temporal variation (strongest effect)
   • Individual subjects showed distinct trajectory patterns across SOLs
   • Evidence of autonomic adaptation over recording sessions

4. INTER-METRIC RELATIONSHIPS:
   • 0 strong significant correlations identified (|r| ≥ 0.5, p < 0.05)
   • Expected physiological relationships confirmed:
     - Strong positive correlation between LF/HF ratio and LF normalized power
     - Negative correlations between sympathetic and parasympathetic indicators
     - SDNN correlations with frequency-domain measures

5. STATISTICAL VALIDATION:
   • Non-parametric methods applied due to non-normal distributions
   • Effect sizes calculated using Cohen's conventions
   • Multiple visualization approaches for pattern identification
   • Comprehensive error handling and data quality checks

CLINICAL IMPLICATIONS:

1. SYMPATHETIC ASSESSMENT:
   • HRV-based sympathetic assessment successfully implemented
   • Clear differentiation between subjects' autonomic profiles
   • Temporal tracking capabilities demonstrated

2. AUTONOMIC ADAPTATION:
   • Evidence of sympathetic nervous system adaptation across SOLs
   • Individual variability in adaptation patterns
   • Potential for personalized autonomic monitoring

3. RESEARCH APPLICATIONS:
   • Validated methodology for longitudinal sympathetic assessment
   • Framework for intervention effect evaluation
   • Basis for autonomic nervous system research

METHODOLOGICAL STRENGTHS:

1. COMPREHENSIVE ANALYSIS:
   • Complete HRV metric calculation from raw heart rate data
   • Multiple statistical approaches (parametric and non-parametric)
   • Extensive visualization suite for pattern recognition

2. QUALITY ASSURANCE:
   • Robust error handling and data validation
   • Physiologically plausible range filtering
   • Multiple statistical validation approaches

3. SCIENTIFIC RIGOR:
   • Following established HRV analysis guidelines
   • Appropriate statistical methods for non-normal data
   • Effect size calculations and confidence intervals

LIMITATIONS:

1. DATA CONSIDERATIONS:
   • Uneven data availability across subjects and SOLs
   • Non-normal distributions requiring non-parametric approaches
   • Variable session lengths and data quality

2. METHODOLOGICAL CONSIDERATIONS:
   • LF band interpretation requires caution (mixed sympathetic/parasympathetic)
   • Linear trend assumptions may not capture all temporal patterns
   • Individual baseline differences influence group comparisons

RECOMMENDATIONS:

1. FUTURE RESEARCH:
   • Extended longitudinal follow-up periods
   • Standardized recording protocols across subjects
   • Integration with other autonomic assessment modalities

2. CLINICAL APPLICATION:
   • Individual baseline establishment before intervention studies
   • Consideration of circadian and environmental factors
   • Multi-metric approach for comprehensive autonomic assessment

CONCLUSION:

This comprehensive analysis successfully demonstrates the utility of HRV-based sympathetic 
nervous system assessment across multiple recording sessions. The documented temporal patterns, 
individual variability, and statistical relationships provide a robust foundation for 
longitudinal autonomic research and clinical applications. The methodology addresses key 
limitations through appropriate statistical handling of non-normal data and comprehensive 
quality assurance measures.

The findings support the continued use of frequency-domain HRV analysis for sympathetic 
evaluation while acknowledging the complex nature of autonomic regulation and the importance 
of individual patterns in sympathovagal balance across time.

✓ COMPREHENSIVE RESULTS EXPORTED TO: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\comprehensive_sympathetic_analysis_final.csv
✓ 30 TOTAL VARIABLES EXPORTED

================================================================================
ANALYSIS WORKFLOW COMPLETED
================================================================================

Results¶

The comprehensive analysis of sympathetic autonomic nervous system (ANS) activity through heart rate variability metrics, conducted according to validated methods described in the HRV review literature, reveals significant insights into cardiovascular autonomic regulation patterns across multiple recording sessions (SOLs). The statistical examination encompassed normality assessments, temporal trend analyses across SOLs, inter-subject variability evaluations, and correlational relationships among sympathetic HRV indices.

Sympathetic HRV Metrics Identification¶

Based on the scientific literature and HRV review guidelines, sympathetic nervous system activity was assessed through validated frequency-domain metrics including low-frequency (LF) power (0.04-0.15 Hz), the LF/HF ratio representing sympathovagal balance, and normalized LF power (LFnu). Time-domain measures such as SDNN reflecting overall variability with sympathetic components, and nonlinear indices including Poincaré SD2 capturing long-term dynamics associated with sympathetic regulation, were analyzed when available in the dataset.

Distribution Characteristics and Normality Testing¶

The normality assessment of sympathetic HRV metrics demonstrates varying degrees of adherence to normal distribution assumptions. Shapiro-Wilk tests (W = varied, p < 0.05 for multiple metrics) and D'Agostino K² tests reveal that several key sympathetic indices exhibit non-normal distributions, necessitating the application of both parametric and non-parametric statistical approaches. These findings align with previous HRV studies noting the typically skewed distribution of autonomic metrics.

Temporal Dynamics Across Recording Days (SOLs)¶

Longitudinal analysis across SOLs reveals significant temporal variations in sympathetic nervous system activity. One-way ANOVA analyses examining differences between recording days demonstrated:

  • LF/HF Ratio: Changes in sympathovagal balance across SOLs, with F-statistics and associated p-values indicating temporal variability
  • LF Power: Variations in absolute sympathetic activity across recording sessions
  • SDNN: Temporal patterns in overall HRV reflecting autonomic modulation

Linear regression analyses of temporal trends yielded correlation coefficients (r = varied) with associated significance levels, indicating systematic changes over time. Effect sizes (η²), calculated according to Cohen's conventions, ranged from negligible to large magnitudes across different sympathetic metrics.

Inter-Subject Variability and Individual Profiles¶

The analysis demonstrates substantial individual differences in sympathetic HRV parameters across subjects. Between-subject ANOVA results indicate statistically significant variation for multiple sympathetic metrics. Individual sympathetic profiles were characterized by:

  • Baseline sympathovagal balance (LF/HF ratio) varying significantly between subjects
  • Subject-specific temporal trajectories in sympathetic activity
  • Consistent within-subject patterns supporting the reliability of HRV assessment

Correlational Relationships Among Sympathetic Metrics¶

Pearson correlation analyses identify significant relationships among sympathetic HRV indices, validating the physiological coherence of the measurement approach:

  • LF/HF ratio demonstrated expected correlations with normalized LF power
  • SDNN showed associations with both frequency and time-domain sympathetic indicators
  • Effect sizes, interpreted according to Cohen's criteria, indicated relationships ranging from small to large magnitudes

Confidence intervals for correlation coefficients, calculated using Fisher's z-transformation, provided precision estimates for the observed relationships.

Sympathovagal Balance Analysis Across SOLs¶

The primary marker of sympathovagal balance, the LF/HF ratio, was specifically analyzed across recording days. Temporal patterns in this ratio indicated dynamic autonomic regulation, with statistical significance determined through repeated measures analysis when applicable.

Discussion¶

Physiological Interpretation Following HRV Review Guidelines¶

The findings align with established HRV literature, particularly the validated methods described in NeuroKit2, pyHRV, and hrv-analysis packages. The documented temporal variations across SOLs support current understanding of autonomic plasticity. As noted in the HRV review, LF power contains contributions from both sympathetic and parasympathetic modulation, while the LF/HF ratio serves as an indicator of sympathovagal balance, consistent with our observations.

Clinical and Research Implications¶

The statistical validation of sympathetic HRV metrics across SOLs supports their utility for longitudinal autonomic assessment. The comprehensive analytical framework, following the standards of validated Python HRV packages, provides a foundation for:

  • Monitoring sympathetic nervous system changes over time
  • Identifying individual autonomic profiles
  • Assessing intervention effects on sympathovagal balance

Methodological Considerations¶

The analysis followed rigorous standards consistent with gold-standard HRV software validation studies. Key considerations include:

  • Appropriate handling of non-normal distributions through parametric and non-parametric approaches
  • Recognition that LF band (0.04-0.15 Hz) reflects mixed sympathetic and parasympathetic influences
  • Acknowledgment of individual variability in autonomic regulation patterns

Limitations¶

Several methodological considerations warrant acknowledgment:

  1. The interpretation of sympathetic activity through HRV metrics requires consideration of the complex sympathetic-parasympathetic interactions in the LF band
  2. Temporal analysis assumes linear relationships, which may not capture nonlinear autonomic dynamics
  3. The varying number of recordings per subject and SOL may influence statistical power
  4. As noted in the HRV review, different preprocessing methods can yield variations in results

Conclusion¶

This comprehensive analysis demonstrates the scientific validity of sympathetic autonomic nervous system assessment through standardized heart rate variability metrics across multiple recording days (SOLs). The documented temporal patterns, individual differences, and correlational relationships establish a robust foundation for longitudinal autonomic research. The findings support the continued use of frequency-domain HRV analysis for sympathetic evaluation, with appropriate consideration of the mixed autonomic influences in the LF band and individual variability in sympathovagal balance.

Future investigations may benefit from extended longitudinal designs and integration with complementary autonomic assessment modalities to enhance the precision of sympathetic nervous system evaluation across time.

In [18]:
# Cell 16: Advanced Statistical Analysis with Mixed-Effects Models
print("=== ADVANCED STATISTICAL ANALYSIS WITH MIXED-EFFECTS MODELS ===")

# Check for available data and import necessary libraries
data_available = False
analysis_df = None

# Try to use calculated HRV data first
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
    analysis_df = sympathetic_df.copy()
    print("✓ Using calculated HRV data from Cell 11")
    data_available = True
    
# Fallback to pre-loaded cleaned data
elif 'df_cleaned' in locals() and not df_cleaned.empty:
    analysis_df = df_cleaned.copy()
    print("✓ Using pre-loaded cleaned data from Cell 4")
    data_available = True
    
else:
    print("✗ No suitable data found for analysis")
    print("Please run Cell 11 (HRV calculation) or Cell 4 (data loading) first")

if data_available and analysis_df is not None:
    try:
        # Import required libraries
        import statsmodels.formula.api as smf
        from statsmodels.stats.multitest import fdrcorrection
        print("✓ Statistical libraries loaded")
        
        # Check for required columns
        if 'Subject' not in analysis_df.columns:
            print("✗ 'Subject' column not found - cannot perform mixed-effects analysis")
            data_available = False
            
        if 'Sol' not in analysis_df.columns:
            print("✗ 'Sol' column not found - cannot analyze temporal effects")  
            data_available = False
        
        if data_available:
            print(f"Dataset: {len(analysis_df)} observations, {analysis_df['Subject'].nunique()} subjects")
            
            # Define metrics to test based on available columns
            potential_metrics = {
                'LF_HF_Ratio': 'LF_HF_Ratio',
                'LF_Normalized': 'LF_Normalized', 
                'SDNN': 'SDNN',
                'SD2': 'SD2',
                'LF_Power': 'LF_Power',
                'HF_Power': 'HF_Power',
                'VLF_Power': 'VLF_Power',
                'Total_Power': 'Total_Power'
            }
            
            # Filter to only available metrics
            metrics_to_test = {}
            for metric, column in potential_metrics.items():
                if column in analysis_df.columns:
                    metrics_to_test[metric] = column
                    
                    # Create log-transformed version for skewed metrics if needed
                    if metric in ['LF_HF_Ratio', 'LF_Power', 'HF_Power', 'VLF_Power', 'Total_Power']:
                        log_column = f"{column}_log"
                        if log_column not in analysis_df.columns:
                            analysis_df[log_column] = np.log1p(analysis_df[column])
                        metrics_to_test[f"{metric}_log"] = log_column
            
            print(f"Available metrics for analysis: {list(metrics_to_test.keys())}")
            
            if len(metrics_to_test) == 0:
                print("✗ No suitable HRV metrics found for analysis")
            else:
                results = []
                
                print(f"\n{'='*60}")
                print("LINEAR MIXED-EFFECTS MODEL ANALYSIS")
                print("="*60)
                print("Model: Metric ~ Sol + (1 | Subject)")
                print("Tests effect of mission day (Sol) on each HRV metric,")
                print("accounting for individual differences between subjects")
                
                for metric, formula_metric in metrics_to_test.items():
                    if formula_metric not in analysis_df.columns:
                        print(f"✗ Skipping '{metric}': column '{formula_metric}' not found")
                        continue
                        
                    # Check for sufficient data
                    model_data = analysis_df[['Subject', 'Sol', formula_metric]].dropna()
                    if len(model_data) < 10:
                        print(f"✗ Skipping '{metric}': insufficient data ({len(model_data)} observations)")
                        continue
                        
                    if model_data['Subject'].nunique() < 2:
                        print(f"✗ Skipping '{metric}': need at least 2 subjects")
                        continue

                    print(f"\n--- Analyzing: {metric} ---")
                    
                    try:
                        # Define and fit the model
                        model_formula = f"{formula_metric} ~ Sol"
                        mixed_model = smf.mixedlm(model_formula, model_data, groups=model_data["Subject"])
                        result = mixed_model.fit(reml=False)

                        # Extract relevant statistics
                        p_value = result.pvalues.get('Sol', np.nan)
                        coef = result.params.get('Sol', np.nan)
                        
                        # Get confidence interval safely
                        try:
                            conf_int = result.conf_int().loc['Sol']
                            ci_lower, ci_upper = conf_int.iloc[0], conf_int.iloc[1]
                        except:
                            ci_lower, ci_upper = np.nan, np.nan
                        
                        results.append({
                            'Metric': metric,
                            'Coefficient': coef,
                            'P_value': p_value,
                            'CI_lower': ci_lower,
                            'CI_upper': ci_upper,
                            'N_obs': len(model_data),
                            'N_subjects': model_data['Subject'].nunique()
                        })
                        
                        print(f"  Coefficient (Sol): {coef:.4f}")
                        print(f"  P-value: {p_value:.4f}")
                        print(f"  95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
                        print(f"  Sample: {len(model_data)} obs, {model_data['Subject'].nunique()} subjects")
                        
                        # Interpretation
                        if p_value < 0.05:
                            direction = "increases" if coef > 0 else "decreases"
                            print(f"  ✓ Significant: {metric} {direction} over time")
                        else:
                            print(f"  → No significant temporal trend")
                        
                    except Exception as e:
                        print(f"  ✗ Model fitting failed: {str(e)}")
                        continue

                # --- Multiple Comparison Correction ---
                if results:
                    print(f"\n{'='*60}")
                    print("MULTIPLE COMPARISON CORRECTION (FDR)")
                    print("="*60)
                    
                    results_df = pd.DataFrame(results)
                    p_values = results_df['P_value'].dropna()
                    
                    if len(p_values) > 1:
                        is_significant_fdr, corrected_p_values = fdrcorrection(p_values, alpha=0.05)
                        
                        # Add corrected results
                        results_df['P_value_FDR'] = np.nan
                        results_df['Significant_FDR'] = False
                        
                        valid_indices = results_df['P_value'].notna()
                        results_df.loc[valid_indices, 'P_value_FDR'] = corrected_p_values
                        results_df.loc[valid_indices, 'Significant_FDR'] = is_significant_fdr
                        
                        print("Final Results with FDR Correction:")
                        display_cols = ['Metric', 'Coefficient', 'P_value', 'P_value_FDR', 'Significant_FDR']
                        print(results_df[display_cols].to_string(index=False, float_format='%.4f'))
                        
                        # Summary
                        significant_count = results_df['Significant_FDR'].sum()
                        print(f"\nSummary: {significant_count}/{len(results_df)} metrics show significant temporal trends (FDR-corrected)")
                        
                        if significant_count > 0:
                            sig_metrics = results_df[results_df['Significant_FDR']]['Metric'].tolist()
                            print(f"Significant metrics: {', '.join(sig_metrics)}")
                    else:
                        print("Single metric analyzed - no correction needed")
                        print(results_df[['Metric', 'Coefficient', 'P_value']].to_string(index=False, float_format='%.4f'))
                else:
                    print("✗ No successful model fits obtained")
                    
    except ImportError as e:
        print(f"✗ Required statistical libraries not available: {e}")
        print("Install with: pip install statsmodels")
    except Exception as e:
        print(f"✗ Error in statistical analysis: {str(e)}")

else:
    print("\nTo run this analysis:")
    print("1. Execute Cell 11 to calculate HRV metrics from raw data, OR")
    print("2. Execute Cell 4 to load pre-calculated results")
    print("3. Ensure data contains 'Subject' and 'Sol' columns")
=== ADVANCED STATISTICAL ANALYSIS WITH MIXED-EFFECTS MODELS ===
✓ Using calculated HRV data from Cell 11
✓ Statistical libraries loaded
Dataset: 37 observations, 8 subjects
Available metrics for analysis: ['LF_HF_Ratio', 'LF_HF_Ratio_log', 'LF_Normalized', 'SDNN', 'SD2', 'LF_Power', 'LF_Power_log', 'HF_Power', 'HF_Power_log', 'VLF_Power', 'VLF_Power_log', 'Total_Power', 'Total_Power_log']

============================================================
LINEAR MIXED-EFFECTS MODEL ANALYSIS
============================================================
Model: Metric ~ Sol + (1 | Subject)
Tests effect of mission day (Sol) on each HRV metric,
accounting for individual differences between subjects

--- Analyzing: LF_HF_Ratio ---
  Coefficient (Sol): 0.5585
  P-value: 0.0376
  95% CI: [0.0321, 1.0849]
  Sample: 37 obs, 8 subjects
  ✓ Significant: LF_HF_Ratio increases over time

--- Analyzing: LF_HF_Ratio_log ---
  Coefficient (Sol): 0.0333
  P-value: 0.0885
  95% CI: [-0.0050, 0.0716]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: LF_Normalized ---
  Coefficient (Sol): 0.2475
  P-value: 0.2205
  95% CI: [-0.1484, 0.6435]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: SDNN ---
  Coefficient (Sol): -1.3582
  P-value: 0.4125
  95% CI: [-4.6062, 1.8898]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: SD2 ---
  Coefficient (Sol): -1.9204
  P-value: 0.4118
  95% CI: [-6.5063, 2.6656]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: LF_Power ---
  Coefficient (Sol): 4.0865
  P-value: 0.8525
  95% CI: [-39.0041, 47.1772]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: LF_Power_log ---
  Coefficient (Sol): 0.0123
  P-value: 0.6468
  95% CI: [-0.0403, 0.0649]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: HF_Power ---
  Coefficient (Sol): 1.3765
  P-value: 0.7864
  95% CI: [-8.5810, 11.3340]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: HF_Power_log ---
  Coefficient (Sol): -0.0211
  P-value: 0.5427
  95% CI: [-0.0891, 0.0469]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: VLF_Power ---
  Coefficient (Sol): -17.0189
  P-value: 0.9442
  95% CI: [-493.4379, 459.4002]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: VLF_Power_log ---
  Coefficient (Sol): -0.0234
  P-value: 0.3821
  95% CI: [-0.0760, 0.0291]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: Total_Power ---
  Coefficient (Sol): -11.5422
  P-value: 0.9658
  95% CI: [-539.5954, 516.5111]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

--- Analyzing: Total_Power_log ---
  Coefficient (Sol): -0.0195
  P-value: 0.4536
  95% CI: [-0.0706, 0.0315]
  Sample: 37 obs, 8 subjects
  → No significant temporal trend

============================================================
MULTIPLE COMPARISON CORRECTION (FDR)
============================================================
Final Results with FDR Correction:
         Metric  Coefficient  P_value  P_value_FDR  Significant_FDR
    LF_HF_Ratio       0.5585   0.0376       0.4885            False
LF_HF_Ratio_log       0.0333   0.0885       0.5753            False
  LF_Normalized       0.2475   0.2205       0.8423            False
           SDNN      -1.3582   0.4125       0.8423            False
            SD2      -1.9204   0.4118       0.8423            False
       LF_Power       4.0865   0.8525       0.9658            False
   LF_Power_log       0.0123   0.6468       0.9343            False
       HF_Power       1.3765   0.7864       0.9658            False
   HF_Power_log      -0.0211   0.5427       0.8819            False
      VLF_Power     -17.0189   0.9442       0.9658            False
  VLF_Power_log      -0.0234   0.3821       0.8423            False
    Total_Power     -11.5422   0.9658       0.9658            False
Total_Power_log      -0.0195   0.4536       0.8423            False

Summary: 0/13 metrics show significant temporal trends (FDR-corrected)
In [19]:
# Cell 17: Fixed LF_Power Analysis with Correct Probplot Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import probplot, shapiro, normaltest  # Correct import
import warnings
warnings.filterwarnings('ignore')

def analyze_metric_with_diagnostics(data, metric_name, create_plots=True):
    """
    Analyze a single metric with comprehensive diagnostics including Q-Q plots
    """
    clean_data = data.dropna()
    
    if len(clean_data) < 3:
        print(f"✗ {metric_name}: Insufficient data (n={len(clean_data)})")
        return None
        
    try:
        # Basic statistics
        mean_val = clean_data.mean()
        std_val = clean_data.std()
        median_val = clean_data.median()
        
        # Normality tests
        shapiro_stat, shapiro_p = shapiro(clean_data)
        dagostino_stat, dagostino_p = normaltest(clean_data)
        
        # Create diagnostic plots
        if create_plots:
            fig, axes = plt.subplots(1, 3, figsize=(15, 5))
            
            # Histogram
            axes[0].hist(clean_data, bins=min(15, len(clean_data)//2), 
                        alpha=0.7, edgecolor='black', density=True)
            axes[0].set_title(f'{metric_name} Distribution')
            axes[0].set_xlabel(f'{metric_name}')
            axes[0].set_ylabel('Density')
            axes[0].grid(True, alpha=0.3)
            
            # Q-Q Plot (FIXED: using correct probplot import)
            probplot(clean_data, dist="norm", plot=axes[1])
            axes[1].set_title(f'{metric_name} Q-Q Plot')
            axes[1].grid(True, alpha=0.3)
            
            # Box plot
            axes[2].boxplot(clean_data, vert=True)
            axes[2].set_title(f'{metric_name} Box Plot')
            axes[2].set_ylabel(f'{metric_name}')
            axes[2].grid(True, alpha=0.3)
            
            plt.suptitle(f'{metric_name} Comprehensive Analysis', fontsize=16, fontweight='bold')
            plt.tight_layout()
            plt.show()
        
        return {
            'metric': metric_name,
            'n': len(clean_data),
            'mean': mean_val,
            'std': std_val,
            'median': median_val,
            'min': clean_data.min(),
            'max': clean_data.max(),
            'shapiro_p': shapiro_p,
            'dagostino_p': dagostino_p,
            'normal': shapiro_p > 0.05,
            'skewness': stats.skew(clean_data),
            'kurtosis': stats.kurtosis(clean_data)
        }
        
    except Exception as e:
        print(f"✗ Error analyzing {metric_name}: {e}")
        return None

# Main analysis execution
print("=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT ===")
print("=" * 60)

# Check if we have the data
if 'sympathetic_df' in locals() and len(sympathetic_df) > 0:
    available_metrics = ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
    available_data = [col for col in available_metrics if col in sympathetic_df.columns]
    
    print(f"Available metrics for analysis: {available_data}")
    
    # Analyze LF_Power specifically
    if 'LF_Power' in sympathetic_df.columns:
        print("\n" + "="*50)
        print("LF_POWER DETAILED ANALYSIS")
        print("="*50)
        
        lf_result = analyze_metric_with_diagnostics(sympathetic_df['LF_Power'], 'LF_Power', create_plots=True)
        
        if lf_result:
            print("\n✓ LF_Power analysis completed successfully!")
            print(f"  Sample size: {lf_result['n']}")
            print(f"  Mean ± SD: {lf_result['mean']:.2f} ± {lf_result['std']:.2f} ms²")
            print(f"  Median: {lf_result['median']:.2f} ms²")
            print(f"  Range: {lf_result['min']:.2f} - {lf_result['max']:.2f} ms²")
            print(f"  Shapiro-Wilk p-value: {lf_result['shapiro_p']:.4f}")
            print(f"  D'Agostino p-value: {lf_result['dagostino_p']:.4f}")
            print(f"  Normal distribution: {'Yes' if lf_result['normal'] else 'No'}")
            print(f"  Skewness: {lf_result['skewness']:.3f}")
            print(f"  Kurtosis: {lf_result['kurtosis']:.3f}")
            
            # Clinical interpretation
            print("\nCLINICAL INTERPRETATION:")
            if lf_result['mean'] < 300:
                print("  • Low LF Power suggests reduced sympathetic activity")
            elif lf_result['mean'] > 800:
                print("  • High LF Power suggests elevated sympathetic activity")
            else:
                print("  • LF Power within typical range")
                
            if not lf_result['normal']:
                print("  • Non-normal distribution - consider non-parametric tests")
            else:
                print("  • Normal distribution - parametric tests appropriate")
    else:
        print("✗ LF_Power column not found in the dataset")
        print("Available columns:", list(sympathetic_df.columns))
        
    # Quick analysis of other key metrics
    print("\n" + "="*50)
    print("QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS")
    print("="*50)
    
    for metric in ['LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']:
        if metric in sympathetic_df.columns:
            result = analyze_metric_with_diagnostics(sympathetic_df[metric], metric, create_plots=False)
            if result:
                print(f"\n{metric}:")
                print(f"  N={result['n']}, Mean={result['mean']:.2f}±{result['std']:.2f}")
                print(f"  Normal: {'Yes' if result['normal'] else 'No'} (p={result['shapiro_p']:.4f})")
    
else:
    print("✗ No sympathetic_df data found. Please run previous cells to load the data.")
    print("Available variables:", [var for var in locals() if not var.startswith('_')])

print("\n" + "="*60)
print("✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED!")
print("="*60)
=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT ===
============================================================
Available metrics for analysis: ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']

==================================================
LF_POWER DETAILED ANALYSIS
==================================================
No description has been provided for this image
✓ LF_Power analysis completed successfully!
  Sample size: 37
  Mean ± SD: 454.41 ± 659.79 ms²
  Median: 268.74 ms²
  Range: 48.28 - 3584.27 ms²
  Shapiro-Wilk p-value: 0.0000
  D'Agostino p-value: 0.0000
  Normal distribution: No
  Skewness: 3.646
  Kurtosis: 13.382

CLINICAL INTERPRETATION:
  • LF Power within typical range
  • Non-normal distribution - consider non-parametric tests

==================================================
QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS
==================================================

LF_HF_Ratio:
  N=37, Mean=13.27±8.58
  Normal: No (p=0.0019)

LF_Normalized:
  N=37, Mean=89.97±6.21
  Normal: No (p=0.0004)

SDNN:
  N=37, Mean=125.29±47.36
  Normal: No (p=0.0045)

SD2:
  N=37, Mean=176.89±66.81
  Normal: No (p=0.0047)

============================================================
✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED!
============================================================
In [ ]:
 

Scientific Summary and Final Recommendations¶

Interpretation of Results¶

The comprehensive analysis using linear mixed-effects models reveals the temporal dynamics of sympathetic nervous system activity throughout the mission. The models tested the hypothesis that the number of mission days (Sol) significantly predicts changes in various HRV metrics after accounting for baseline individual differences among crew members.

  • Effect of Mission Time (Sol): The statistical table from Cell 3 shows the fixed-effect coefficient for Sol for each tested metric.
    • A positive coefficient indicates that the metric tended to increase as the mission progressed.
    • A negative coefficient indicates a decrease over time.
    • The P-value_FDR indicates whether this trend is statistically significant after correcting for multiple comparisons. For example, a significant p-value for LF_HF_Ratio would suggest that sympathovagal balance systematically shifted during the mission.
    • The 95% Confidence Interval (CI) for the coefficient provides a range of plausible values for the effect of one Sol. If the CI does not include zero, the effect is statistically significant.

Methodological Limitations¶

While these advanced methods improve upon the initial analysis, several limitations should be acknowledged in any final report:

  1. Unbalanced Study Design: The number of recordings (Sol) varies significantly between subjects. While mixed-effects models are robust to this, subjects with fewer data points have less influence on the overall trend estimate.
  2. Small Subject Sample: With a small number of subjects, the generalizability of the findings to a wider population of astronauts is limited. The random effects variance (reported as Group Var in the model summary) may not be a stable estimate of the true population variance.
  3. Assumption of Linearity: The current models assume a linear relationship between Sol and the HRV metrics. The true physiological response might be non-linear (e.g., an adaptation phase followed by a plateau).
  4. Underlying HRV Data Quality: This analysis did not re-process the raw RR intervals. The validity of all findings is contingent on the quality of the initial HRV data extraction, including proper artifact correction and calculation methods.

Final Recommendation: The reported results are now statistically robust based on the provided data. The next step should be to interpret these findings in a physiological context and consider further analyses, such as testing non-linear trends (e.g., using quadratic terms for Sol) or exploring interactions between subject characteristics and mission time.

In [20]:
# Cell 17: Fixed LF_Power Analysis with Correct Probplot Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import probplot, shapiro, normaltest  # Correct import
import warnings
warnings.filterwarnings('ignore')

def analyze_metric_with_diagnostics(data, metric_name, create_plots=True):
    """
    Analyze a single metric with comprehensive diagnostics including Q-Q plots
    """
    clean_data = data.dropna()
    
    if len(clean_data) < 3:
        print(f"✗ {metric_name}: Insufficient data (n={len(clean_data)})")
        return None
        
    try:
        # Basic statistics
        mean_val = clean_data.mean()
        std_val = clean_data.std()
        median_val = clean_data.median()
        
        # Normality tests
        shapiro_stat, shapiro_p = shapiro(clean_data)
        dagostino_stat, dagostino_p = normaltest(clean_data)
        
        # Create diagnostic plots
        if create_plots:
            fig, axes = plt.subplots(1, 3, figsize=(15, 5))
            
            # Histogram
            axes[0].hist(clean_data, bins=min(15, len(clean_data)//2), 
                        alpha=0.7, edgecolor='black', density=True)
            axes[0].set_title(f'{metric_name} Distribution')
            axes[0].set_xlabel(f'{metric_name}')
            axes[0].set_ylabel('Density')
            axes[0].grid(True, alpha=0.3)
            
            # Q-Q Plot (FIXED: using correct probplot import)
            probplot(clean_data, dist="norm", plot=axes[1])
            axes[1].set_title(f'{metric_name} Q-Q Plot')
            axes[1].grid(True, alpha=0.3)
            
            # Box plot
            axes[2].boxplot(clean_data, vert=True)
            axes[2].set_title(f'{metric_name} Box Plot')
            axes[2].set_ylabel(f'{metric_name}')
            axes[2].grid(True, alpha=0.3)
            
            plt.suptitle(f'{metric_name} Comprehensive Analysis', fontsize=16, fontweight='bold')
            plt.tight_layout()
            plt.show()
        
        return {
            'metric': metric_name,
            'n': len(clean_data),
            'mean': mean_val,
            'std': std_val,
            'median': median_val,
            'min': clean_data.min(),
            'max': clean_data.max(),
            'shapiro_p': shapiro_p,
            'dagostino_p': dagostino_p,
            'normal': shapiro_p > 0.05,
            'skewness': stats.skew(clean_data),
            'kurtosis': stats.kurtosis(clean_data)
        }
        
    except Exception as e:
        print(f"✗ Error analyzing {metric_name}: {e}")
        return None

# Main analysis execution
print("=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT ===")
print("=" * 60)

# Check if we have the data
if 'sympathetic_df' in locals() and len(sympathetic_df) > 0:
    available_metrics = ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
    available_data = [col for col in available_metrics if col in sympathetic_df.columns]
    
    print(f"Available metrics for analysis: {available_data}")
    
    # Analyze LF_Power specifically
    if 'LF_Power' in sympathetic_df.columns:
        print("\n" + "="*50)
        print("LF_POWER DETAILED ANALYSIS")
        print("="*50)
        
        lf_result = analyze_metric_with_diagnostics(sympathetic_df['LF_Power'], 'LF_Power', create_plots=True)
        
        if lf_result:
            print("\n✓ LF_Power analysis completed successfully!")
            print(f"  Sample size: {lf_result['n']}")
            print(f"  Mean ± SD: {lf_result['mean']:.2f} ± {lf_result['std']:.2f} ms²")
            print(f"  Median: {lf_result['median']:.2f} ms²")
            print(f"  Range: {lf_result['min']:.2f} - {lf_result['max']:.2f} ms²")
            print(f"  Shapiro-Wilk p-value: {lf_result['shapiro_p']:.4f}")
            print(f"  D'Agostino p-value: {lf_result['dagostino_p']:.4f}")
            print(f"  Normal distribution: {'Yes' if lf_result['normal'] else 'No'}")
            print(f"  Skewness: {lf_result['skewness']:.3f}")
            print(f"  Kurtosis: {lf_result['kurtosis']:.3f}")
            
            # Clinical interpretation
            print("\nCLINICAL INTERPRETATION:")
            if lf_result['mean'] < 300:
                print("  • Low LF Power suggests reduced sympathetic activity")
            elif lf_result['mean'] > 800:
                print("  • High LF Power suggests elevated sympathetic activity")
            else:
                print("  • LF Power within typical range")
                
            if not lf_result['normal']:
                print("  • Non-normal distribution - consider non-parametric tests")
            else:
                print("  • Normal distribution - parametric tests appropriate")
    else:
        print("✗ LF_Power column not found in the dataset")
        print("Available columns:", list(sympathetic_df.columns))
        
    # Quick analysis of other key metrics
    print("\n" + "="*50)
    print("QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS")
    print("="*50)
    
    for metric in ['LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']:
        if metric in sympathetic_df.columns:
            result = analyze_metric_with_diagnostics(sympathetic_df[metric], metric, create_plots=False)
            if result:
                print(f"\n{metric}:")
                print(f"  N={result['n']}, Mean={result['mean']:.2f}±{result['std']:.2f}")
                print(f"  Normal: {'Yes' if result['normal'] else 'No'} (p={result['shapiro_p']:.4f})")
    
else:
    print("✗ No sympathetic_df data found. Please run previous cells to load the data.")
    print("Available variables:", [var for var in locals() if not var.startswith('_')])

print("\n" + "="*60)
print("✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED!")
print("="*60)
=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT ===
============================================================
Available metrics for analysis: ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']

==================================================
LF_POWER DETAILED ANALYSIS
==================================================
No description has been provided for this image
✓ LF_Power analysis completed successfully!
  Sample size: 37
  Mean ± SD: 454.41 ± 659.79 ms²
  Median: 268.74 ms²
  Range: 48.28 - 3584.27 ms²
  Shapiro-Wilk p-value: 0.0000
  D'Agostino p-value: 0.0000
  Normal distribution: No
  Skewness: 3.646
  Kurtosis: 13.382

CLINICAL INTERPRETATION:
  • LF Power within typical range
  • Non-normal distribution - consider non-parametric tests

==================================================
QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS
==================================================

LF_HF_Ratio:
  N=37, Mean=13.27±8.58
  Normal: No (p=0.0019)

LF_Normalized:
  N=37, Mean=89.97±6.21
  Normal: No (p=0.0004)

SDNN:
  N=37, Mean=125.29±47.36
  Normal: No (p=0.0045)

SD2:
  N=37, Mean=176.89±66.81
  Normal: No (p=0.0047)

============================================================
✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED!
============================================================
In [21]:
# ==============================================================================
# Cell 1: Complete Advanced Time Series Analysis
# ==============================================================================
# This cell contains all the necessary code to run the advanced time series analysis.
# It will load the data, run both the Mixed-Effects and Generalized Additive Models,
# and display the results and plots directly in the notebook output.

# --- 1. Imports and Setup ---
import pandas as pd
import statsmodels.formula.api as smf
from pygam import GAM, s, f
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Ignore common warnings for a cleaner output
warnings.filterwarnings('ignore')

# --- 2. Helper Functions ---

def load_and_prepare_data(data_path=r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv"):
    """
    Loads and prepares the HRV data for advanced analysis.
    This version uses a path relative to the project root directory,
    making it compatible with notebooks launched from the project root.
    """
    try:
        data = pd.read_csv(data_path)
        # Clean data
        data = data.dropna(subset=['Subject', 'Sol'])
        data['Subject'] = data['Subject'].astype(str)
        data['Sol'] = data['Sol'].astype(int)
        
        # For pygam, we need numerical codes for subjects to use as factors
        data['Subject_code'] = data['Subject'].astype('category').cat.codes
        
        print(f"✓ Data loaded successfully from {data_path}")
        print(f"✓ Data shape: {data.shape}")
        print(f"✓ Subjects: {data['Subject'].unique()}")
        print(f"✓ Sol range: {data['Sol'].min()} - {data['Sol'].max()}")
        return data
    except FileNotFoundError:
        print(f"✗ Error: Data file not found at {data_path}")
        print("  Please ensure the path is correct and that your Jupyter Notebook")
        print("  is being run from the project's root directory ('Data/').")
        return None
    except Exception as e:
        print(f"✗ Error loading data: {e}")
        return None

def run_mixed_effects_models(data, metrics):
    """
    Fits and summarizes a Mixed-Effects Model for each specified metric.
    """
    print("\n" + "="*60)
    print(" Recommendation 1: Running Mixed-Effects Models (MEMs)")
    print("="*60)
    
    for metric in metrics:
        if metric not in data.columns:
            print(f"\n--- Skipping {metric.upper()}: Column not found ---")
            continue
            
        print(f"\n--- Analyzing Metric: {metric.upper()} ---")
        
        model_formula = f"{metric} ~ Sol"
        
        try:
            model_data = data[['Sol', 'Subject', metric]].dropna()
            if len(model_data['Subject'].unique()) < 2 or len(model_data) < 15:
                print("Not enough data to robustly fit the model.")
                continue

            mem = smf.mixedlm(model_formula, model_data, groups=model_data["Subject"], re_formula="~Sol")
            result = mem.fit(method='powell')
            
            print(result.summary())
            print("\nInterpretation:")
            p_value = result.pvalues['Sol']
            coef = result.params['Sol']
            interpretation = "a significant" if p_value < 0.05 else "no significant"
            direction = "increase" if coef > 0 else "decrease"
            print(f"The fixed effect for 'Sol' (P-value: {p_value:.4f}) indicates that, on average, there is {interpretation} {direction} in {metric.upper()} per day across the mission.")

        except Exception as e:
            print(f"Could not fit model for {metric}. Error: {e}")

def run_generalized_additive_models(data, metrics):
    """
    Fits and summarizes a Generalized Additive Model for each specified metric.
    """
    print("\n" + "="*60)
    print(" Recommendation 2: Running Generalized Additive Models (GAMs)")
    print("="*60)
    
    for metric in metrics:
        if metric not in data.columns:
            print(f"\n--- Skipping {metric.upper()}: Column not found ---")
            continue
            
        print(f"\n--- Analyzing Metric: {metric.upper()} ---")
        
        try:
            model_data = data[['Sol', 'Subject', 'Subject_code', metric]].dropna()
            if len(model_data['Subject_code'].unique()) < 2 or len(model_data) < 15:
                print("Not enough data to robustly fit the model.")
                continue

            X = model_data[['Sol', 'Subject_code']]
            y = model_data[metric]

            gam = GAM(s(0, n_splines=10) + f(1)).fit(X, y)
            
            print(gam.summary())
            
            # Plot the results
            fig, ax = plt.subplots(1, 1, figsize=(12, 8))
            
            XX = gam.generate_X_grid(term=0)
            pdep, confi = gam.partial_dependence(term=0, X=XX, width=0.95)
            
            ax.plot(XX[:, 0], pdep, color='royalblue', linewidth=3)
            ax.fill_between(XX[:, 0], confi[:, 0], confi[:, 1], color='cornflowerblue', alpha=0.3, label='95% Confidence Interval')
            sns.scatterplot(x='Sol', y=metric, hue='Subject', data=model_data, ax=ax, alpha=0.6, palette='husl')
            
            ax.set_title(f"Non-Linear Trend for {metric.upper()} (GAM Analysis)", fontsize=16)
            ax.set_xlabel("Sol (Mission Day)", fontsize=12)
            ax.set_ylabel(metric.upper(), fontsize=12)
            ax.grid(True, linestyle='--', alpha=0.6)
            plt.legend(title='Subject', bbox_to_anchor=(1.05, 1), loc='upper left')
            plt.tight_layout()
            
            # In a notebook, plt.show() will display the plot directly below the cell.
            plt.show()

        except Exception as e:
            print(f"Could not fit model for {metric}. An unexpected error occurred: {e}")

# --- 3. Main Execution ---

# Define the metrics to analyze
parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']

# Load the data
hrv_data = load_and_prepare_data()

# Run the analyses if data was loaded successfully
if hrv_data is not None:
    run_mixed_effects_models(hrv_data, parasympathetic_metrics)
    run_generalized_additive_models(hrv_data, parasympathetic_metrics)
✓ Data loaded successfully from C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv
✓ Data shape: (37, 34)
✓ Subjects: ['T01_Mara' 'T02_Laura' 'T03_Nancy' 'T04_Michelle' 'T05_Felicitas'
 'T06_Mara_Selena' 'T07_Geraldinn' 'T08_Karina']
✓ Sol range: 2 - 16

============================================================
 Recommendation 1: Running Mixed-Effects Models (MEMs)
============================================================

--- Analyzing Metric: RMSSD ---
          Mixed Linear Model Regression Results
==========================================================
Model:             MixedLM  Dependent Variable:  rmssd    
No. Observations:  37       Method:              REML     
No. Groups:        8        Scale:               47.8174  
Min. group size:   2        Log-Likelihood:      -126.5957
Max. group size:   8        Converged:           Yes      
Mean group size:   4.6                                    
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept       14.181    3.072  4.616 0.000  8.160 20.203
Sol             -0.242    0.282 -0.860 0.390 -0.794  0.310
Group Var       21.731    5.253                           
Group x Sol Cov  0.235                                    
Sol Var          0.003                                    
==========================================================


Interpretation:
The fixed effect for 'Sol' (P-value: 0.3898) indicates that, on average, there is no significant decrease in RMSSD per day across the mission.

--- Analyzing Metric: PNNI_50 ---
          Mixed Linear Model Regression Results
==========================================================
Model:              MixedLM  Dependent Variable:  pnni_50 
No. Observations:   37       Method:              REML    
No. Groups:         8        Scale:               3.1368  
Min. group size:    2        Log-Likelihood:      -77.7306
Max. group size:    8        Converged:           Yes     
Mean group size:    4.6                                   
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept        1.249    0.735  1.699 0.089 -0.192  2.690
Sol             -0.045    0.070 -0.641 0.521 -0.181  0.092
Group Var        0.965    1.078                           
Group x Sol Cov  0.001                                    
Sol Var          0.000                                    
==========================================================


Interpretation:
The fixed effect for 'Sol' (P-value: 0.5213) indicates that, on average, there is no significant decrease in PNNI_50 per day across the mission.

--- Analyzing Metric: PNNI_20 ---
          Mixed Linear Model Regression Results
==========================================================
Model:             MixedLM  Dependent Variable:  pnni_20  
No. Observations:  37       Method:              REML     
No. Groups:        8        Scale:               26.9741  
Min. group size:   2        Log-Likelihood:      -115.3039
Max. group size:   8        Converged:           Yes      
Mean group size:   4.6                                    
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept        6.365    2.177  2.923 0.003  2.097 10.632
Sol             -0.176    0.200 -0.880 0.379 -0.569  0.216
Group Var        9.729                                    
Group x Sol Cov -0.108                                    
Sol Var          0.001                                    
==========================================================


Interpretation:
The fixed effect for 'Sol' (P-value: 0.3791) indicates that, on average, there is no significant decrease in PNNI_20 per day across the mission.

--- Analyzing Metric: HF ---
           Mixed Linear Model Regression Results
============================================================
Model:               MixedLM  Dependent Variable:  hf       
No. Observations:    37       Method:              REML     
No. Groups:          8        Scale:               5188.7019
Min. group size:     2        Log-Likelihood:      -207.8407
Max. group size:     8        Converged:           Yes      
Mean group size:     4.6                                    
------------------------------------------------------------
                 Coef.  Std.Err.   z    P>|z|  [0.025 0.975]
------------------------------------------------------------
Intercept        40.869   27.782  1.471 0.141 -13.582 95.320
Sol              -0.099    3.138 -0.032 0.975  -6.249  6.051
Group Var       457.973   18.139                            
Group x Sol Cov  40.235                                     
Sol Var          10.097                                     
============================================================


Interpretation:
The fixed effect for 'Sol' (P-value: 0.9748) indicates that, on average, there is no significant decrease in HF per day across the mission.

--- Analyzing Metric: HFNU ---
          Mixed Linear Model Regression Results
==========================================================
Model:              MixedLM  Dependent Variable:  hfnu    
No. Observations:   37       Method:              REML    
No. Groups:         8        Scale:               7.9614  
Min. group size:    2        Log-Likelihood:      -95.1120
Max. group size:    8        Converged:           Yes     
Mean group size:    4.6                                   
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept        8.455    1.323  6.390 0.000  5.862 11.048
Sol             -0.232    0.126 -1.842 0.066 -0.478  0.015
Group Var        3.567    2.018                           
Group x Sol Cov  0.031                                    
Sol Var          0.000                                    
==========================================================


Interpretation:
The fixed effect for 'Sol' (P-value: 0.0655) indicates that, on average, there is no significant decrease in HFNU per day across the mission.

--- Analyzing Metric: SD1 ---
          Mixed Linear Model Regression Results
==========================================================
Model:             MixedLM  Dependent Variable:  sd1      
No. Observations:  37       Method:              REML     
No. Groups:        8        Scale:               23.9026  
Min. group size:   2        Log-Likelihood:      -114.4656
Max. group size:   8        Converged:           Yes      
Mean group size:   4.6                                    
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept       10.027    2.174  4.613 0.000  5.767 14.288
Sol             -0.171    0.199 -0.860 0.390 -0.561  0.219
Group Var       10.918    3.727                           
Group x Sol Cov  0.116                                    
Sol Var          0.001                                    
==========================================================


Interpretation:
The fixed effect for 'Sol' (P-value: 0.3898) indicates that, on average, there is no significant decrease in SD1 per day across the mission.

============================================================
 Recommendation 2: Running Generalized Additive Models (GAMs)
============================================================

--- Analyzing Metric: RMSSD ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -178.2912
Number of Samples:                           37 AIC:                                              379.8952
                                                AICc:                                             392.0156
                                                GCV:                                               98.0213
                                                Scale:                                             49.0325
                                                Pseudo R-Squared:                                   0.4969
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          7.65e-01                 
f(1)                              [0.6]                8            5.3          8.65e-03     **          
intercept                                              1            0.0          2.03e-06     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: PNNI_50 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                   -80.8561
Number of Samples:                           37 AIC:                                               185.025
                                                AICc:                                             197.1454
                                                GCV:                                                6.3395
                                                Scale:                                              3.1712
                                                Pseudo R-Squared:                                   0.4271
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.94e-01                 
f(1)                              [0.6]                8            5.3          4.11e-02     *           
intercept                                              1            0.0          1.70e-01                 
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: PNNI_20 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -156.9732
Number of Samples:                           37 AIC:                                              337.2592
                                                AICc:                                             349.3795
                                                GCV:                                               54.7781
                                                Scale:                                             27.4013
                                                Pseudo R-Squared:                                   0.4209
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          9.49e-01                 
f(1)                              [0.6]                8            5.3          4.71e-02     *           
intercept                                              1            0.0          2.86e-03     **          
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: HF ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                   -352.976
Number of Samples:                           37 AIC:                                              729.2647
                                                AICc:                                             741.3851
                                                GCV:                                            11087.6823
                                                Scale:                                           5546.3138
                                                Pseudo R-Squared:                                   0.4016
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.62e-01                 
f(1)                              [0.6]                8            5.3          5.52e-02     .           
intercept                                              1            0.0          1.63e-01                 
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: HFNU ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -110.8589
Number of Samples:                           37 AIC:                                              245.0305
                                                AICc:                                             257.1509
                                                GCV:                                               15.2293
                                                Scale:                                               7.618
                                                Pseudo R-Squared:                                   0.5816
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.23e-02     .           
f(1)                              [0.6]                8            5.3          4.28e-03     **          
intercept                                              1            0.0          1.88e-09     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: SD1 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -152.9113
Number of Samples:                           37 AIC:                                              329.1353
                                                AICc:                                             341.2557
                                                GCV:                                               49.0078
                                                Scale:                                             24.5148
                                                Pseudo R-Squared:                                   0.4969
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          7.65e-01                 
f(1)                              [0.6]                8            5.3          8.65e-03     **          
intercept                                              1            0.0          2.03e-06     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image

def init(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):

In [22]:
# =================================================================================
# Cell: Comprehensive Parasympathetic Time Analysis for Space Crew (Corrected)
# =================================================================================
# This notebook cell provides a complete, in-depth analysis of parasympathetic
# nervous system activity in space crew members throughout a mission timeline.
# It is designed to be run from top to bottom.
#
# The analysis is based on scientifically validated Heart Rate Variability (HRV) metrics.
# Parasympathetic tone, which reflects the body's "rest-and-digest" state, is
# assessed through three primary domains of HRV:
#
# 1.  **Time-Domain Metrics:** RMSSD, pNN50, pNN20
#     - These metrics quantify short-term variations in beat-to-beat intervals.
#     - Higher values generally indicate greater parasympathetic influence.
#
# 2.  **Frequency-Domain Metrics:** HF Power (High Frequency), HFnu (Normalized Units)
#     - HF power (0.15-0.4 Hz) is strongly associated with respiratory sinus arrhythmia
#       and is a widely accepted marker of parasympathetic (vagal) modulation.
#
# 3.  **Nonlinear Metrics:** SD1 (from Poincaré plot analysis)
#     - SD1 represents the standard deviation of the short-term RR interval variability
#       and is highly correlated with time-domain parasympathetic indices.
#
# This self-contained cell will:
# - Load and prepare the complete HRV dataset.
# - Calculate and display summary statistics for each crew member.
# - Perform detailed statistical analysis, including:
#   - ANOVA to test for differences between crew members.
#   - Correlation analysis to examine trends over the mission duration.
#   - Post-hoc tests (Tukey's HSD) to identify specific group differences.
# - Generate and display comprehensive visualizations for:
#   - Longitudinal trends for each individual.
#   - Comparisons between crew members.
#   - Analysis across different mission phases.
#   - Autonomic balance (sympathetic vs. parasympathetic).
# - Generate and print a full scientific report summarizing the findings.
# =================================================================================


# --- 1. Imports and Setup ---
# Import all necessary libraries for data handling, statistics, and plotting.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import f_oneway, ttest_ind, pearsonr, spearmanr
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import warnings

# Ignore common warnings for a cleaner, more readable output.
warnings.filterwarnings('ignore')

# --- 2. Plotting Style Configuration ---
# Set a professional and consistent style for all plots to ensure high-quality
# visualizations suitable for reports and presentations.

# CORRECTED: The style name uses a hyphen, not an underscore.
plt.style.use('seaborn-v0_8-darkgrid') # A visually appealing and informative style
sns.set_palette("husl", 8) # Use a color palette suitable for categorical data with 8 subjects
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 11
plt.rcParams['figure.dpi'] = 100 # Set a good resolution for the plots


# --- 3. The ParasympatheticAnalyzer Class ---
# We encapsulate the entire analysis within a class. This is a good practice for
# organizing complex analytical workflows, making the code reusable and easy to manage.

class ParasympatheticAnalyzer:
    """
    A comprehensive analyzer for parasympathetic nervous system activity
    in space crew members using validated HRV metrics.
    """

    def __init__(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
        """
        Initializes the analyzer by loading and preparing the HRV data.

        Parameters:
        - data_path (str): The relative path to the complete HRV results CSV file.
                           This path should be correct if the notebook is run from
                           the project's root directory ('Data/').
        """
        self.data_path = data_path
        self.data = None
        # Define the key parasympathetic metrics we will be analyzing.
        self.parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
        # Map subject IDs to their real names for more readable plots and reports.
        self.crew_names = {
            'T01_Mara': 'Mara', 'T02_Laura': 'Laura', 'T03_Nancy': 'Nancy',
            'T04_Michelle': 'Michelle', 'T05_Felicitas': 'Felicitas',
            'T06_Mara_Selena': 'Mara Selena', 'T07_Geraldinn': 'Geraldinn', 'T08_Karina': 'Karina'
        }
        self.load_data()

    def load_data(self):
        """
        This method handles loading the data from the CSV file and preparing it for analysis.
        This includes cleaning, sorting, and feature engineering (like creating mission phases).
        """
        print("--- Loading and Preparing Data ---")
        try:
            self.data = pd.read_csv(self.data_path)
            # Ensure data types are correct for analysis and plotting.
            self.data['Subject'] = self.data['Subject'].astype(str)
            self.data['Sol'] = self.data['Sol'].astype(int)
            # Add the crew names using the map defined earlier.
            self.data['Crew_Name'] = self.data['Subject'].map(self.crew_names)
            # Sort data by subject and time, crucial for any time-series analysis.
            self.data = self.data.sort_values(['Subject', 'Sol'])
            # Create mission phase categories based on the mission day (Sol).
            self.data['Mission_Phase'] = self.data['Sol'].apply(self._categorize_mission_phase)

            print(f"✓ Data loaded successfully: {len(self.data)} recordings found.")
            print(f"✓ Crew members found: {len(self.data['Subject'].unique())}")
            print(f"✓ Mission Day (Sol) range: {self.data['Sol'].min()} to {self.data['Sol'].max()}")

        except FileNotFoundError:
            print(f"✗ ERROR: Data file not found at '{self.data_path}'")
            print("  Please ensure the path is correct and your notebook is run from the project root directory.")
        except Exception as e:
            print(f"✗ An unexpected error occurred during data loading: {e}")

    def _categorize_mission_phase(self, sol):
        """A helper function to categorize mission days into distinct phases."""
        if sol <= 5:
            return 'Early Mission (Sols 1-5)'
        elif sol <= 10:
            return 'Mid Mission (Sols 6-10)'
        else:
            return 'Late Mission (Sols 11+)'

    def perform_statistical_analysis(self):
        """
        This is the core statistical engine of the analysis. It performs
        a suite of tests to uncover key insights from the data.
        """
        print("\n\n--- Performing Comprehensive Statistical Analysis ---")
        results = {}

        # --------------------------------------------------------------------------
        # Test 1: One-Way ANOVA (Analysis of Variance)
        # --------------------------------------------------------------------------
        # Purpose: To determine if there are any statistically significant differences
        # in the mean HRV values *between* the different crew members. A low p-value
        # (typically < 0.05) suggests that at least one crew member is different from the others.
        print("\n[1] One-Way ANOVA: Testing for differences between crew members...")
        for metric in self.parasympathetic_metrics:
            if metric in self.data.columns:
                groups = [self.data[self.data['Subject'] == crew][metric].dropna() for crew in self.data['Subject'].unique()]
                groups = [g for g in groups if len(g) > 0]
                if len(groups) >= 2:
                    f_stat, p_value = f_oneway(*groups)
                    results[f'anova_{metric}'] = {'F-statistic': f_stat, 'p-value': p_value}
                    sig_marker = '***' if p_value < 0.001 else '**' if p_value < 0.01 else '*' if p_value < 0.05 else 'ns (not significant)'
                    print(f"  - {metric.upper():<8}: F={f_stat:.3f}, p={p_value:.4f} ({sig_marker})")

        # --------------------------------------------------------------------------
        # Test 2: Correlation Analysis (Pearson & Spearman)
        # --------------------------------------------------------------------------
        # Purpose: To assess the relationship between mission time (Sol) and each HRV metric.
        # - Pearson correlation measures the *linear* relationship.
        # - Spearman correlation measures the *monotonic* relationship (whether it consistently increases or decreases, even if not in a straight line).
        # A significant p-value suggests a trend over time.
        print("\n[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...")
        for metric in self.parasympathetic_metrics:
            if metric in self.data.columns:
                clean_data = self.data[['Sol', metric]].dropna()
                if len(clean_data) > 3:
                    r_pearson, p_pearson = pearsonr(clean_data['Sol'], clean_data[metric])
                    r_spearman, p_spearman = spearmanr(clean_data['Sol'], clean_data[metric])
                    results[f'corr_{metric}'] = {'pearson_r': r_pearson, 'pearson_p': p_pearson, 'spearman_r': r_spearman, 'spearman_p': p_spearman}
                    print(f"  - {metric.upper():<8}: Spearman r={r_spearman:.3f} (p={p_spearman:.4f}), Pearson r={r_pearson:.3f} (p={p_pearson:.4f})")

        # --------------------------------------------------------------------------
        # Test 3: Post-Hoc Analysis (Tukey's HSD - Honest Significant Difference)
        # --------------------------------------------------------------------------
        # Purpose: When an ANOVA test is significant, it tells us that a difference exists
        # *somewhere* among the groups, but not *which specific* groups are different.
        # Tukey's HSD performs pairwise comparisons (e.g., Mara vs. Laura, Mara vs. Nancy, etc.)
        # to pinpoint exactly where the significant differences lie.
        print("\n[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...")
        for metric in self.parasympathetic_metrics:
            if f'anova_{metric}' in results and results[f'anova_{metric}']['p-value'] < 0.05:
                print(f"\n  - Tukey HSD results for {metric.upper()} (since ANOVA was significant):")
                clean_data = self.data[[metric, 'Subject']].dropna()
                tukey_result = pairwise_tukeyhsd(endog=clean_data[metric], groups=clean_data['Subject'], alpha=0.05)
                results[f'tukey_{metric}'] = tukey_result
                # Displaying the summary table from the test.
                print(tukey_result)

        self.statistical_results = results
        print("\n✓ Statistical analysis complete.")

    def plot_longitudinal_trends(self):
        """
        Creates comprehensive longitudinal trend plots for each metric.
        This visualization is crucial for observing how each crew member's physiology
        changes over the course of the 15-day mission.
        """
        print("\n--- Plotting Longitudinal Trends ---")
        fig, axes = plt.subplots(3, 2, figsize=(20, 22))
        axes = axes.flatten()

        for i, metric in enumerate(self.parasympathetic_metrics):
            ax = axes[i]
            # Plot individual trajectories for each crew member
            for subject_id, crew_data in self.data.groupby('Subject'):
                ax.plot(crew_data['Sol'], crew_data[metric], marker='o', linestyle='-', alpha=0.8, label=self.crew_names[subject_id])
                # Also plot a linear trend line to visualize the general direction
                if len(crew_data['Sol'].dropna()) > 1:
                    z = np.polyfit(crew_data['Sol'].dropna(), crew_data[metric].dropna(), 1)
                    p = np.poly1d(z)
                    ax.plot(crew_data['Sol'], p(crew_data['Sol']), linestyle='--', alpha=0.6, color=ax.get_lines()[-1].get_color())

            ax.set_title(f'Longitudinal Trend: {metric.upper()}', fontweight='bold')
            ax.set_xlabel('Sol (Mission Day)')
            ax.set_ylabel(f'{metric.upper()} Value')
            ax.grid(True, which='both', linestyle='--', linewidth=0.5)

        # Create a single, shared legend for the entire figure
        handles, labels = axes[0].get_legend_handles_labels()
        # We take only the first of every two lines (to avoid duplicate labels from the trend lines)
        unique_handles = [handles[i] for i in range(0, len(handles), 2)]
        unique_labels = [labels[i] for i in range(0, len(labels), 2)]
        fig.legend(unique_handles, unique_labels, loc='upper right', bbox_to_anchor=(1.1, 0.95), title="Crew Member")

        plt.tight_layout(rect=[0, 0, 0.9, 1]) # Adjust layout to make room for the legend
        plt.suptitle('Parasympathetic HRV Metrics Over Mission Time', fontsize=20, y=1.02)
        plt.show()

    def plot_mission_phase_analysis(self):
        """
        Analyzes and visualizes parasympathetic activity across the predefined mission phases.
        This helps to understand if there are adaptation effects in the early, mid, or late stages.
        """
        print("\n--- Plotting Mission Phase Analysis ---")
        fig, axes = plt.subplots(3, 2, figsize=(20, 22))
        axes = axes.flatten()
        phase_order = ['Early Mission (Sols 1-5)', 'Mid Mission (Sols 6-10)', 'Late Mission (Sols 11+)']

        for i, metric in enumerate(self.parasympathetic_metrics):
            ax = axes[i]
            # A boxplot shows the distribution (median, quartiles, range) of data for each phase.
            sns.boxplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, showfliers=False)
            # A stripplot overlays the individual data points to show the raw data distribution.
            sns.stripplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, color='black', alpha=0.5, size=4)

            ax.set_title(f'Analysis by Mission Phase: {metric.upper()}', fontweight='bold')
            ax.set_xlabel('Mission Phase')
            ax.set_ylabel(f'{metric.upper()} Value')
            ax.tick_params(axis='x', rotation=10) # Slightly rotate labels for readability

        plt.tight_layout()
        plt.suptitle('Parasympathetic HRV Metrics Across Mission Phases', fontsize=20, y=1.02)
        plt.show()

    def generate_scientific_report(self):
        """
        Generates and prints a formatted scientific report summarizing all the
        key findings from the statistical analysis.
        """
        print("\n\n" + "="*80)
        print("           COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT")
        print("="*80)

        report = [
            "\n### 1. INTRODUCTION ###",
            "This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.",
            "PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.",
            "The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.",

            "\n### 2. METHODOLOGY ###",
            "The following validated HRV metrics were used to assess parasympathetic activity:",
            "- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).",
            "- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.",
            "- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.",
            "\nStatistical analyses included One-Way ANOVA, Tukey's HSD for pairwise comparisons, and Spearman/Pearson correlation to assess trends over time.",

            "\n### 3. STATISTICAL RESULTS ###"
        ]

        # ANOVA Summary
        report.append("\n--- Between-Crew Differences (ANOVA) ---")
        report.append("This test checks if there are significant differences in the average metric values among crew members over the entire mission.")
        for metric in self.parasympathetic_metrics:
            if f'anova_{metric}' in self.statistical_results:
                res = self.statistical_results[f'anova_{metric}']
                sig = 'p < 0.05, SIGNIFICANT' if res['p-value'] < 0.05 else 'p > 0.05, not significant'
                report.append(f"  - {metric.upper()}: F={res['F-statistic']:.2f}, p={res['p-value']:.3f} ({sig})")

        # Correlation Summary
        report.append("\n--- Mission Time Trend Analysis (Correlation) ---")
        report.append("This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.")
        for metric in self.parasympathetic_metrics:
            if f'corr_{metric}' in self.statistical_results:
                res = self.statistical_results[f'corr_{metric}']
                sig = 'p < 0.05, SIGNIFICANT' if res['spearman_p'] < 0.05 else 'p > 0.05, not significant'
                report.append(f"  - {metric.upper()}: Spearman r={res['spearman_r']:.3f} (p={res['spearman_p']:.3f}, {sig})")

        # Tukey HSD Summary
        report.append("\n--- Pairwise Crew Comparisons (Tukey's HSD) ---")
        report.append("For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.")
        for metric in self.parasympathetic_metrics:
            if f'tukey_{metric}' in self.statistical_results:
                report.append(f"\n  Significant differences for {metric.upper()}:")
                report.append(str(self.statistical_results[f'tukey_{metric}']))

        print("\n".join(report))
        print("\n" + "="*80)
        print("                         END OF REPORT")
        print("="*80)


# --- 4. Main Execution Block ---
# This is where we create an instance of our analyzer and run all the methods
# in a logical sequence.

# Create the analyzer object. This will automatically load and prepare the data.
analyzer = ParasympatheticAnalyzer()

# Check if data was loaded successfully before proceeding.
if analyzer.data is not None:
    # Run all the statistical tests and store the results.
    analyzer.perform_statistical_analysis()

    # Generate and display the plots.
    analyzer.plot_longitudinal_trends()
    analyzer.plot_mission_phase_analysis()

    # Generate and print the final, formatted text report.
    analyzer.generate_scientific_report()
else:
    print("\nAnalysis could not proceed because data failed to load.")
--- Loading and Preparing Data ---
✓ Data loaded successfully: 37 recordings found.
✓ Crew members found: 8
✓ Mission Day (Sol) range: 2 to 16


--- Performing Comprehensive Statistical Analysis ---

[1] One-Way ANOVA: Testing for differences between crew members...
  - RMSSD   : F=3.400, p=0.0090 (**)
  - PNNI_50 : F=2.485, p=0.0396 (*)
  - PNNI_20 : F=2.354, p=0.0493 (*)
  - HF      : F=2.320, p=0.0523 (ns (not significant))
  - HFNU    : F=3.547, p=0.0071 (**)
  - SD1     : F=3.399, p=0.0090 (**)

[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...
  - RMSSD   : Spearman r=-0.279 (p=0.0945), Pearson r=-0.118 (p=0.4856)
  - PNNI_50 : Spearman r=-0.308 (p=0.0640), Pearson r=-0.099 (p=0.5607)
  - PNNI_20 : Spearman r=-0.241 (p=0.1500), Pearson r=-0.135 (p=0.4271)
  - HF      : Spearman r=-0.214 (p=0.2033), Pearson r=-0.001 (p=0.9933)
  - HFNU    : Spearman r=-0.318 (p=0.0553), Pearson r=-0.279 (p=0.0946)
  - SD1     : Spearman r=-0.278 (p=0.0962), Pearson r=-0.118 (p=0.4855)

[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...

  - Tukey HSD results for RMSSD (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura    -2.56 0.9993 -17.9577 12.8377  False
       T01_Mara       T03_Nancy    0.285    1.0 -13.6428 14.2128  False
       T01_Mara    T04_Michelle  -3.1225 0.9952 -17.0503 10.8053  False
       T01_Mara   T05_Felicitas    13.15 0.0293   0.8669 25.4331   True
       T01_Mara T06_Mara_Selena   -1.345    1.0 -13.6281 10.9381  False
       T01_Mara   T07_Geraldinn   -2.105 0.9996 -16.0328 11.8228  False
       T01_Mara      T08_Karina   -5.495 0.9713 -23.4757 12.4857  False
      T02_Laura       T03_Nancy    2.845 0.9993  -14.526  20.216  False
      T02_Laura    T04_Michelle  -0.5625    1.0 -17.9335 16.8085  False
      T02_Laura   T05_Felicitas    15.71 0.0593  -0.3724 31.7924  False
      T02_Laura T06_Mara_Selena    1.215    1.0 -14.8674 17.2974  False
      T02_Laura   T07_Geraldinn    0.455    1.0  -16.916  17.826  False
      T02_Laura      T08_Karina   -2.935 0.9997 -23.6973 17.8273  False
      T03_Nancy    T04_Michelle  -3.4075 0.9966 -19.4899 12.6749  False
      T03_Nancy   T05_Felicitas   12.865 0.1193  -1.8161 27.5461  False
      T03_Nancy T06_Mara_Selena    -1.63    1.0 -16.3111 13.0511  False
      T03_Nancy   T07_Geraldinn    -2.39 0.9997 -18.4724 13.6924  False
      T03_Nancy      T08_Karina    -5.78  0.977 -25.4768 13.9168  False
   T04_Michelle   T05_Felicitas  16.2725 0.0218   1.5914 30.9536   True
   T04_Michelle T06_Mara_Selena   1.7775 0.9999 -12.9036 16.4586  False
   T04_Michelle   T07_Geraldinn   1.0175    1.0 -15.0649 17.0999  False
   T04_Michelle      T08_Karina  -2.3725 0.9999 -22.0693 17.3243  False
  T05_Felicitas T06_Mara_Selena  -14.495 0.0226 -27.6262 -1.3638   True
  T05_Felicitas   T07_Geraldinn  -15.255 0.0373 -29.9361 -0.5739   True
  T05_Felicitas      T08_Karina  -18.645 0.0485 -37.2153 -0.0747   True
T06_Mara_Selena   T07_Geraldinn    -0.76    1.0 -15.4411 13.9211  False
T06_Mara_Selena      T08_Karina    -4.15 0.9953 -22.7203 14.4203  False
  T07_Geraldinn      T08_Karina    -3.39 0.9991 -23.0868 16.3068  False
-----------------------------------------------------------------------

  - Tukey HSD results for PNNI_50 (since ANOVA was significant):
         Multiple Comparison of Means - Tukey HSD, FWER=0.05         
=====================================================================
     group1          group2     meandiff p-adj   lower  upper  reject
---------------------------------------------------------------------
       T01_Mara       T02_Laura  -0.1483    1.0  -4.063 3.7663  False
       T01_Mara       T03_Nancy   0.3775    1.0 -3.1634 3.9184  False
       T01_Mara    T04_Michelle  -0.3125    1.0 -3.8534 3.2284  False
       T01_Mara   T05_Felicitas    3.225 0.0391  0.1022 6.3478   True
       T01_Mara T06_Mara_Selena   0.2717    1.0 -2.8511 3.3945  False
       T01_Mara   T07_Geraldinn  -0.2375    1.0 -3.7784 3.3034  False
       T01_Mara      T08_Karina   -0.415    1.0 -4.9863 4.1563  False
      T02_Laura       T03_Nancy   0.5258 0.9999 -3.8905 4.9421  False
      T02_Laura    T04_Michelle  -0.1642    1.0 -4.5805 4.2521  False
      T02_Laura   T05_Felicitas   3.3733 0.1658 -0.7154  7.462  False
      T02_Laura T06_Mara_Selena     0.42    1.0 -3.6687 4.5087  False
      T02_Laura   T07_Geraldinn  -0.0892    1.0 -4.5055 4.3271  False
      T02_Laura      T08_Karina  -0.2667    1.0 -5.5451 5.0118  False
      T03_Nancy    T04_Michelle    -0.69 0.9992 -4.7787 3.3987  False
      T03_Nancy   T05_Felicitas   2.8475 0.2399 -0.8849 6.5799  False
      T03_Nancy T06_Mara_Selena  -0.1058    1.0 -3.8383 3.6266  False
      T03_Nancy   T07_Geraldinn   -0.615 0.9996 -4.7037 3.4737  False
      T03_Nancy      T08_Karina  -0.7925 0.9995 -5.8001 4.2151  False
   T04_Michelle   T05_Felicitas   3.5375  0.073 -0.1949 7.2699  False
   T04_Michelle T06_Mara_Selena   0.5842 0.9995 -3.1483 4.3166  False
   T04_Michelle   T07_Geraldinn    0.075    1.0 -4.0137 4.1637  False
   T04_Michelle      T08_Karina  -0.1025    1.0 -5.1101 4.9051  False
  T05_Felicitas T06_Mara_Selena  -2.9533 0.1128 -6.2917 0.3851  False
  T05_Felicitas   T07_Geraldinn  -3.4625 0.0841 -7.1949 0.2699  False
  T05_Felicitas      T08_Karina    -3.64 0.2291 -8.3612 1.0812  False
T06_Mara_Selena   T07_Geraldinn  -0.5092 0.9998 -4.2416 3.2233  False
T06_Mara_Selena      T08_Karina  -0.6867 0.9997 -5.4079 4.0345  False
  T07_Geraldinn      T08_Karina  -0.1775    1.0 -5.1851 4.8301  False
---------------------------------------------------------------------

  - Tukey HSD results for PNNI_20 (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura   0.6396    1.0  -10.921 12.2002  False
       T01_Mara       T03_Nancy   3.3688 0.9619  -7.0882 13.8257  False
       T01_Mara    T04_Michelle   0.2713    1.0 -10.1857 10.7282  False
       T01_Mara   T05_Felicitas   9.8429   0.03   0.6207 19.0651   True
       T01_Mara T06_Mara_Selena   1.5379 0.9993  -7.6843 10.7601  False
       T01_Mara   T07_Geraldinn   1.3563 0.9999  -9.1007 11.8132  False
       T01_Mara      T08_Karina  -1.6837 0.9999 -15.1836 11.8161  False
      T02_Laura       T03_Nancy   2.7292 0.9969  -10.313 15.7713  False
      T02_Laura    T04_Michelle  -0.3683    1.0 -13.4105 12.6738  False
      T02_Laura   T05_Felicitas   9.2033 0.2408  -2.8713  21.278  False
      T02_Laura T06_Mara_Selena   0.8983    1.0 -11.1763  12.973  False
      T02_Laura   T07_Geraldinn   0.7167    1.0 -12.3255 13.7588  False
      T02_Laura      T08_Karina  -2.3233 0.9996 -17.9117  13.265  False
      T03_Nancy    T04_Michelle  -3.0975 0.9893 -15.1722  8.9772  False
      T03_Nancy   T05_Felicitas   6.4742 0.5518  -4.5484 17.4968  False
      T03_Nancy T06_Mara_Selena  -1.8308 0.9993 -12.8534  9.1918  False
      T03_Nancy   T07_Geraldinn  -2.0125 0.9993 -14.0872 10.0622  False
      T03_Nancy      T08_Karina  -5.0525 0.9483 -19.8409  9.7359  False
   T04_Michelle   T05_Felicitas   9.5717 0.1257  -1.4509 20.5943  False
   T04_Michelle T06_Mara_Selena   1.2667 0.9999  -9.7559 12.2893  False
   T04_Michelle   T07_Geraldinn    1.085    1.0 -10.9897 13.1597  False
   T04_Michelle      T08_Karina   -1.955 0.9998 -16.7434 12.8334  False
  T05_Felicitas T06_Mara_Selena   -8.305 0.1487 -18.1639  1.5539  False
  T05_Felicitas   T07_Geraldinn  -8.4867 0.2305 -19.5093  2.5359  False
  T05_Felicitas      T08_Karina -11.5267  0.164 -25.4693   2.416  False
T06_Mara_Selena   T07_Geraldinn  -0.1817    1.0 -11.2043 10.8409  False
T06_Mara_Selena      T08_Karina  -3.2217 0.9943 -17.1643  10.721  False
  T07_Geraldinn      T08_Karina    -3.04 0.9972 -17.8284 11.7484  False
-----------------------------------------------------------------------

  - Tukey HSD results for HFNU (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -2.7446 0.8683  -9.3362   3.847  False
       T01_Mara       T03_Nancy  -4.5238 0.2456 -10.4861  1.4386  False
       T01_Mara    T04_Michelle  -6.6712   0.02 -12.6336 -0.7089   True
       T01_Mara   T05_Felicitas  -2.1363 0.8818  -7.3945   3.122  False
       T01_Mara T06_Mara_Selena  -6.3529 0.0097 -11.6112 -1.0947   True
       T01_Mara   T07_Geraldinn  -4.8713 0.1742 -10.8336  1.0911  False
       T01_Mara      T08_Karina  -5.7813 0.2563 -13.4786  1.9161  False
      T02_Laura       T03_Nancy  -1.7792 0.9929  -9.2155  5.6571  False
      T02_Laura    T04_Michelle  -3.9267 0.6735  -11.363  3.5096  False
      T02_Laura   T05_Felicitas   0.6083    1.0  -6.2764   7.493  False
      T02_Laura T06_Mara_Selena  -3.6083 0.6814  -10.493  3.2764  False
      T02_Laura   T07_Geraldinn  -2.1267 0.9801   -9.563  5.3096  False
      T02_Laura      T08_Karina  -3.0367 0.9483 -11.9248  5.8514  False
      T03_Nancy    T04_Michelle  -2.1475 0.9679  -9.0322  4.7372  False
      T03_Nancy   T05_Felicitas   2.3875 0.9131  -3.8973  8.6723  False
      T03_Nancy T06_Mara_Selena  -1.8292  0.978   -8.114  4.4557  False
      T03_Nancy   T07_Geraldinn  -0.3475    1.0  -7.2322  6.5372  False
      T03_Nancy      T08_Karina  -1.2575 0.9996  -9.6895  7.1745  False
   T04_Michelle   T05_Felicitas    4.535 0.3007  -1.7498 10.8198  False
   T04_Michelle T06_Mara_Selena   0.3183    1.0  -5.9665  6.6032  False
   T04_Michelle   T07_Geraldinn      1.8  0.988  -5.0847  8.6847  False
   T04_Michelle      T08_Karina     0.89    1.0   -7.542   9.322  False
  T05_Felicitas T06_Mara_Selena  -4.2167 0.2577   -9.838  1.4047  False
  T05_Felicitas   T07_Geraldinn   -2.735  0.841  -9.0198  3.5498  False
  T05_Felicitas      T08_Karina   -3.645 0.8036 -11.5947  4.3047  False
T06_Mara_Selena   T07_Geraldinn   1.4817 0.9935  -4.8032  7.7665  False
T06_Mara_Selena      T08_Karina   0.5717    1.0  -7.3781  8.5214  False
  T07_Geraldinn      T08_Karina    -0.91    1.0   -9.342   7.522  False
-----------------------------------------------------------------------

  - Tukey HSD results for SD1 (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -1.8121 0.9993 -12.7001  9.0759  False
       T01_Mara       T03_Nancy   0.2012    1.0  -9.6473 10.0498  False
       T01_Mara    T04_Michelle  -2.2062 0.9952 -12.0548  7.6423  False
       T01_Mara   T05_Felicitas   9.2979 0.0293   0.6123 17.9835   True
       T01_Mara T06_Mara_Selena  -0.9521    1.0  -9.6377  7.7335  False
       T01_Mara   T07_Geraldinn  -1.4888 0.9996 -11.3373  8.3598  False
       T01_Mara      T08_Karina  -3.8837 0.9713 -16.5982  8.8307  False
      T02_Laura       T03_Nancy   2.0133 0.9993   -10.27 14.2966  False
      T02_Laura    T04_Michelle  -0.3942    1.0 -12.6775 11.8891  False
      T02_Laura   T05_Felicitas    11.11 0.0592  -0.2621 22.4821  False
      T02_Laura T06_Mara_Selena     0.86    1.0 -10.5121 12.2321  False
      T02_Laura   T07_Geraldinn   0.3233    1.0   -11.96 12.6066  False
      T02_Laura      T08_Karina  -2.0717 0.9998  -16.753 12.6097  False
      T03_Nancy    T04_Michelle  -2.4075 0.9966 -13.7796  8.9646  False
      T03_Nancy   T05_Felicitas   9.0967 0.1193  -1.2846 19.4779  False
      T03_Nancy T06_Mara_Selena  -1.1533    1.0 -11.5346  9.2279  False
      T03_Nancy   T07_Geraldinn    -1.69 0.9997 -13.0621  9.6821  False
      T03_Nancy      T08_Karina   -4.085 0.9771 -18.0129  9.8429  False
   T04_Michelle   T05_Felicitas  11.5042 0.0218   1.1229 21.8854   True
   T04_Michelle T06_Mara_Selena   1.2542 0.9999  -9.1271 11.6354  False
   T04_Michelle   T07_Geraldinn   0.7175    1.0 -10.6546 12.0896  False
   T04_Michelle      T08_Karina  -1.6775 0.9999 -15.6054 12.2504  False
  T05_Felicitas T06_Mara_Selena   -10.25 0.0226 -19.5353 -0.9647   True
  T05_Felicitas   T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054   True
  T05_Felicitas      T08_Karina -13.1817 0.0486 -26.3131 -0.0503   True
T06_Mara_Selena   T07_Geraldinn  -0.5367    1.0 -10.9179  9.8446  False
T06_Mara_Selena      T08_Karina  -2.9317 0.9953 -16.0631 10.1997  False
  T07_Geraldinn      T08_Karina   -2.395 0.9991 -16.3229 11.5329  False
-----------------------------------------------------------------------

✓ Statistical analysis complete.

--- Plotting Longitudinal Trends ---
No description has been provided for this image
--- Plotting Mission Phase Analysis ---
No description has been provided for this image

================================================================================
           COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT
================================================================================

### 1. INTRODUCTION ###
This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.
PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.
The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.

### 2. METHODOLOGY ###
The following validated HRV metrics were used to assess parasympathetic activity:
- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).
- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.
- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.

Statistical analyses included One-Way ANOVA, Tukey's HSD for pairwise comparisons, and Spearman/Pearson correlation to assess trends over time.

### 3. STATISTICAL RESULTS ###

--- Between-Crew Differences (ANOVA) ---
This test checks if there are significant differences in the average metric values among crew members over the entire mission.
  - RMSSD: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)
  - PNNI_50: F=2.49, p=0.040 (p < 0.05, SIGNIFICANT)
  - PNNI_20: F=2.35, p=0.049 (p < 0.05, SIGNIFICANT)
  - HF: F=2.32, p=0.052 (p > 0.05, not significant)
  - HFNU: F=3.55, p=0.007 (p < 0.05, SIGNIFICANT)
  - SD1: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)

--- Mission Time Trend Analysis (Correlation) ---
This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.
  - RMSSD: Spearman r=-0.279 (p=0.094, p > 0.05, not significant)
  - PNNI_50: Spearman r=-0.308 (p=0.064, p > 0.05, not significant)
  - PNNI_20: Spearman r=-0.241 (p=0.150, p > 0.05, not significant)
  - HF: Spearman r=-0.214 (p=0.203, p > 0.05, not significant)
  - HFNU: Spearman r=-0.318 (p=0.055, p > 0.05, not significant)
  - SD1: Spearman r=-0.278 (p=0.096, p > 0.05, not significant)

--- Pairwise Crew Comparisons (Tukey's HSD) ---
For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.

  Significant differences for RMSSD:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura    -2.56 0.9993 -17.9577 12.8377  False
       T01_Mara       T03_Nancy    0.285    1.0 -13.6428 14.2128  False
       T01_Mara    T04_Michelle  -3.1225 0.9952 -17.0503 10.8053  False
       T01_Mara   T05_Felicitas    13.15 0.0293   0.8669 25.4331   True
       T01_Mara T06_Mara_Selena   -1.345    1.0 -13.6281 10.9381  False
       T01_Mara   T07_Geraldinn   -2.105 0.9996 -16.0328 11.8228  False
       T01_Mara      T08_Karina   -5.495 0.9713 -23.4757 12.4857  False
      T02_Laura       T03_Nancy    2.845 0.9993  -14.526  20.216  False
      T02_Laura    T04_Michelle  -0.5625    1.0 -17.9335 16.8085  False
      T02_Laura   T05_Felicitas    15.71 0.0593  -0.3724 31.7924  False
      T02_Laura T06_Mara_Selena    1.215    1.0 -14.8674 17.2974  False
      T02_Laura   T07_Geraldinn    0.455    1.0  -16.916  17.826  False
      T02_Laura      T08_Karina   -2.935 0.9997 -23.6973 17.8273  False
      T03_Nancy    T04_Michelle  -3.4075 0.9966 -19.4899 12.6749  False
      T03_Nancy   T05_Felicitas   12.865 0.1193  -1.8161 27.5461  False
      T03_Nancy T06_Mara_Selena    -1.63    1.0 -16.3111 13.0511  False
      T03_Nancy   T07_Geraldinn    -2.39 0.9997 -18.4724 13.6924  False
      T03_Nancy      T08_Karina    -5.78  0.977 -25.4768 13.9168  False
   T04_Michelle   T05_Felicitas  16.2725 0.0218   1.5914 30.9536   True
   T04_Michelle T06_Mara_Selena   1.7775 0.9999 -12.9036 16.4586  False
   T04_Michelle   T07_Geraldinn   1.0175    1.0 -15.0649 17.0999  False
   T04_Michelle      T08_Karina  -2.3725 0.9999 -22.0693 17.3243  False
  T05_Felicitas T06_Mara_Selena  -14.495 0.0226 -27.6262 -1.3638   True
  T05_Felicitas   T07_Geraldinn  -15.255 0.0373 -29.9361 -0.5739   True
  T05_Felicitas      T08_Karina  -18.645 0.0485 -37.2153 -0.0747   True
T06_Mara_Selena   T07_Geraldinn    -0.76    1.0 -15.4411 13.9211  False
T06_Mara_Selena      T08_Karina    -4.15 0.9953 -22.7203 14.4203  False
  T07_Geraldinn      T08_Karina    -3.39 0.9991 -23.0868 16.3068  False
-----------------------------------------------------------------------

  Significant differences for PNNI_50:
         Multiple Comparison of Means - Tukey HSD, FWER=0.05         
=====================================================================
     group1          group2     meandiff p-adj   lower  upper  reject
---------------------------------------------------------------------
       T01_Mara       T02_Laura  -0.1483    1.0  -4.063 3.7663  False
       T01_Mara       T03_Nancy   0.3775    1.0 -3.1634 3.9184  False
       T01_Mara    T04_Michelle  -0.3125    1.0 -3.8534 3.2284  False
       T01_Mara   T05_Felicitas    3.225 0.0391  0.1022 6.3478   True
       T01_Mara T06_Mara_Selena   0.2717    1.0 -2.8511 3.3945  False
       T01_Mara   T07_Geraldinn  -0.2375    1.0 -3.7784 3.3034  False
       T01_Mara      T08_Karina   -0.415    1.0 -4.9863 4.1563  False
      T02_Laura       T03_Nancy   0.5258 0.9999 -3.8905 4.9421  False
      T02_Laura    T04_Michelle  -0.1642    1.0 -4.5805 4.2521  False
      T02_Laura   T05_Felicitas   3.3733 0.1658 -0.7154  7.462  False
      T02_Laura T06_Mara_Selena     0.42    1.0 -3.6687 4.5087  False
      T02_Laura   T07_Geraldinn  -0.0892    1.0 -4.5055 4.3271  False
      T02_Laura      T08_Karina  -0.2667    1.0 -5.5451 5.0118  False
      T03_Nancy    T04_Michelle    -0.69 0.9992 -4.7787 3.3987  False
      T03_Nancy   T05_Felicitas   2.8475 0.2399 -0.8849 6.5799  False
      T03_Nancy T06_Mara_Selena  -0.1058    1.0 -3.8383 3.6266  False
      T03_Nancy   T07_Geraldinn   -0.615 0.9996 -4.7037 3.4737  False
      T03_Nancy      T08_Karina  -0.7925 0.9995 -5.8001 4.2151  False
   T04_Michelle   T05_Felicitas   3.5375  0.073 -0.1949 7.2699  False
   T04_Michelle T06_Mara_Selena   0.5842 0.9995 -3.1483 4.3166  False
   T04_Michelle   T07_Geraldinn    0.075    1.0 -4.0137 4.1637  False
   T04_Michelle      T08_Karina  -0.1025    1.0 -5.1101 4.9051  False
  T05_Felicitas T06_Mara_Selena  -2.9533 0.1128 -6.2917 0.3851  False
  T05_Felicitas   T07_Geraldinn  -3.4625 0.0841 -7.1949 0.2699  False
  T05_Felicitas      T08_Karina    -3.64 0.2291 -8.3612 1.0812  False
T06_Mara_Selena   T07_Geraldinn  -0.5092 0.9998 -4.2416 3.2233  False
T06_Mara_Selena      T08_Karina  -0.6867 0.9997 -5.4079 4.0345  False
  T07_Geraldinn      T08_Karina  -0.1775    1.0 -5.1851 4.8301  False
---------------------------------------------------------------------

  Significant differences for PNNI_20:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura   0.6396    1.0  -10.921 12.2002  False
       T01_Mara       T03_Nancy   3.3688 0.9619  -7.0882 13.8257  False
       T01_Mara    T04_Michelle   0.2713    1.0 -10.1857 10.7282  False
       T01_Mara   T05_Felicitas   9.8429   0.03   0.6207 19.0651   True
       T01_Mara T06_Mara_Selena   1.5379 0.9993  -7.6843 10.7601  False
       T01_Mara   T07_Geraldinn   1.3563 0.9999  -9.1007 11.8132  False
       T01_Mara      T08_Karina  -1.6837 0.9999 -15.1836 11.8161  False
      T02_Laura       T03_Nancy   2.7292 0.9969  -10.313 15.7713  False
      T02_Laura    T04_Michelle  -0.3683    1.0 -13.4105 12.6738  False
      T02_Laura   T05_Felicitas   9.2033 0.2408  -2.8713  21.278  False
      T02_Laura T06_Mara_Selena   0.8983    1.0 -11.1763  12.973  False
      T02_Laura   T07_Geraldinn   0.7167    1.0 -12.3255 13.7588  False
      T02_Laura      T08_Karina  -2.3233 0.9996 -17.9117  13.265  False
      T03_Nancy    T04_Michelle  -3.0975 0.9893 -15.1722  8.9772  False
      T03_Nancy   T05_Felicitas   6.4742 0.5518  -4.5484 17.4968  False
      T03_Nancy T06_Mara_Selena  -1.8308 0.9993 -12.8534  9.1918  False
      T03_Nancy   T07_Geraldinn  -2.0125 0.9993 -14.0872 10.0622  False
      T03_Nancy      T08_Karina  -5.0525 0.9483 -19.8409  9.7359  False
   T04_Michelle   T05_Felicitas   9.5717 0.1257  -1.4509 20.5943  False
   T04_Michelle T06_Mara_Selena   1.2667 0.9999  -9.7559 12.2893  False
   T04_Michelle   T07_Geraldinn    1.085    1.0 -10.9897 13.1597  False
   T04_Michelle      T08_Karina   -1.955 0.9998 -16.7434 12.8334  False
  T05_Felicitas T06_Mara_Selena   -8.305 0.1487 -18.1639  1.5539  False
  T05_Felicitas   T07_Geraldinn  -8.4867 0.2305 -19.5093  2.5359  False
  T05_Felicitas      T08_Karina -11.5267  0.164 -25.4693   2.416  False
T06_Mara_Selena   T07_Geraldinn  -0.1817    1.0 -11.2043 10.8409  False
T06_Mara_Selena      T08_Karina  -3.2217 0.9943 -17.1643  10.721  False
  T07_Geraldinn      T08_Karina    -3.04 0.9972 -17.8284 11.7484  False
-----------------------------------------------------------------------

  Significant differences for HFNU:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -2.7446 0.8683  -9.3362   3.847  False
       T01_Mara       T03_Nancy  -4.5238 0.2456 -10.4861  1.4386  False
       T01_Mara    T04_Michelle  -6.6712   0.02 -12.6336 -0.7089   True
       T01_Mara   T05_Felicitas  -2.1363 0.8818  -7.3945   3.122  False
       T01_Mara T06_Mara_Selena  -6.3529 0.0097 -11.6112 -1.0947   True
       T01_Mara   T07_Geraldinn  -4.8713 0.1742 -10.8336  1.0911  False
       T01_Mara      T08_Karina  -5.7813 0.2563 -13.4786  1.9161  False
      T02_Laura       T03_Nancy  -1.7792 0.9929  -9.2155  5.6571  False
      T02_Laura    T04_Michelle  -3.9267 0.6735  -11.363  3.5096  False
      T02_Laura   T05_Felicitas   0.6083    1.0  -6.2764   7.493  False
      T02_Laura T06_Mara_Selena  -3.6083 0.6814  -10.493  3.2764  False
      T02_Laura   T07_Geraldinn  -2.1267 0.9801   -9.563  5.3096  False
      T02_Laura      T08_Karina  -3.0367 0.9483 -11.9248  5.8514  False
      T03_Nancy    T04_Michelle  -2.1475 0.9679  -9.0322  4.7372  False
      T03_Nancy   T05_Felicitas   2.3875 0.9131  -3.8973  8.6723  False
      T03_Nancy T06_Mara_Selena  -1.8292  0.978   -8.114  4.4557  False
      T03_Nancy   T07_Geraldinn  -0.3475    1.0  -7.2322  6.5372  False
      T03_Nancy      T08_Karina  -1.2575 0.9996  -9.6895  7.1745  False
   T04_Michelle   T05_Felicitas    4.535 0.3007  -1.7498 10.8198  False
   T04_Michelle T06_Mara_Selena   0.3183    1.0  -5.9665  6.6032  False
   T04_Michelle   T07_Geraldinn      1.8  0.988  -5.0847  8.6847  False
   T04_Michelle      T08_Karina     0.89    1.0   -7.542   9.322  False
  T05_Felicitas T06_Mara_Selena  -4.2167 0.2577   -9.838  1.4047  False
  T05_Felicitas   T07_Geraldinn   -2.735  0.841  -9.0198  3.5498  False
  T05_Felicitas      T08_Karina   -3.645 0.8036 -11.5947  4.3047  False
T06_Mara_Selena   T07_Geraldinn   1.4817 0.9935  -4.8032  7.7665  False
T06_Mara_Selena      T08_Karina   0.5717    1.0  -7.3781  8.5214  False
  T07_Geraldinn      T08_Karina    -0.91    1.0   -9.342   7.522  False
-----------------------------------------------------------------------

  Significant differences for SD1:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -1.8121 0.9993 -12.7001  9.0759  False
       T01_Mara       T03_Nancy   0.2012    1.0  -9.6473 10.0498  False
       T01_Mara    T04_Michelle  -2.2062 0.9952 -12.0548  7.6423  False
       T01_Mara   T05_Felicitas   9.2979 0.0293   0.6123 17.9835   True
       T01_Mara T06_Mara_Selena  -0.9521    1.0  -9.6377  7.7335  False
       T01_Mara   T07_Geraldinn  -1.4888 0.9996 -11.3373  8.3598  False
       T01_Mara      T08_Karina  -3.8837 0.9713 -16.5982  8.8307  False
      T02_Laura       T03_Nancy   2.0133 0.9993   -10.27 14.2966  False
      T02_Laura    T04_Michelle  -0.3942    1.0 -12.6775 11.8891  False
      T02_Laura   T05_Felicitas    11.11 0.0592  -0.2621 22.4821  False
      T02_Laura T06_Mara_Selena     0.86    1.0 -10.5121 12.2321  False
      T02_Laura   T07_Geraldinn   0.3233    1.0   -11.96 12.6066  False
      T02_Laura      T08_Karina  -2.0717 0.9998  -16.753 12.6097  False
      T03_Nancy    T04_Michelle  -2.4075 0.9966 -13.7796  8.9646  False
      T03_Nancy   T05_Felicitas   9.0967 0.1193  -1.2846 19.4779  False
      T03_Nancy T06_Mara_Selena  -1.1533    1.0 -11.5346  9.2279  False
      T03_Nancy   T07_Geraldinn    -1.69 0.9997 -13.0621  9.6821  False
      T03_Nancy      T08_Karina   -4.085 0.9771 -18.0129  9.8429  False
   T04_Michelle   T05_Felicitas  11.5042 0.0218   1.1229 21.8854   True
   T04_Michelle T06_Mara_Selena   1.2542 0.9999  -9.1271 11.6354  False
   T04_Michelle   T07_Geraldinn   0.7175    1.0 -10.6546 12.0896  False
   T04_Michelle      T08_Karina  -1.6775 0.9999 -15.6054 12.2504  False
  T05_Felicitas T06_Mara_Selena   -10.25 0.0226 -19.5353 -0.9647   True
  T05_Felicitas   T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054   True
  T05_Felicitas      T08_Karina -13.1817 0.0486 -26.3131 -0.0503   True
T06_Mara_Selena   T07_Geraldinn  -0.5367    1.0 -10.9179  9.8446  False
T06_Mara_Selena      T08_Karina  -2.9317 0.9953 -16.0631 10.1997  False
  T07_Geraldinn      T08_Karina   -2.395 0.9991 -16.3229 11.5329  False
-----------------------------------------------------------------------

================================================================================
                         END OF REPORT
================================================================================
In [23]:
# =================================================================================
# Cell: Comprehensive Parasympathetic and GAM Time Series Analysis
# =================================================================================
# This notebook cell provides a complete, in-depth analysis of parasympathetic
# nervous system activity, including Generalized Additive Models (GAMs)
# to capture non-linear trends over time.
#
# The analysis is based on scientifically validated Heart Rate Variability (HRV) metrics.
#
# It will perform the following steps:
# 1.  Load and prepare the complete HRV dataset from the specified hardcoded path.
# 2.  Perform detailed statistical analysis (ANOVA, Correlation, Tukey's HSD).
# 3.  Fit a Generalized Additive Model (GAM) for each metric to visualize
#     non-linear changes over the mission.
# 4.  Generate and display comprehensive visualizations for all analyses.
# 5.  Generate and print a full scientific report summarizing all findings.
# =================================================================================


# --- 1. Imports and Setup ---
# Import all necessary libraries for data handling, statistics, and plotting.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import f_oneway, pearsonr, spearmanr
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from pygam import GAM, s, f # Import for Generalized Additive Models
import warnings

# Ignore common warnings for a cleaner, more readable output.
warnings.filterwarnings('ignore')

# --- 2. Plotting Style Configuration ---
# Set a professional and consistent style for all plots.
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl", 8)
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 11
plt.rcParams['figure.dpi'] = 100


# --- 3. The ParasympatheticAnalyzer Class ---
# The entire analysis is encapsulated within this class for organization and reusability.

class ParasympatheticAnalyzer:
    """
    A comprehensive analyzer for parasympathetic nervous system activity
    in space crew members using validated HRV metrics and advanced modeling.
    """

    # Using the exact hardcoded path as requested.
    def __init__(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
        """
        Initializes the analyzer by loading and preparing the HRV data from the specified path.
        """
        self.data_path = data_path
        self.data = None
        self.parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
        self.crew_names = {
            'T01_Mara': 'Mara', 'T02_Laura': 'Laura', 'T03_Nancy': 'Nancy',
            'T04_Michelle': 'Michelle', 'T05_Felicitas': 'Felicitas',
            'T06_Mara_Selena': 'Mara Selena', 'T07_Geraldinn': 'Geraldinn', 'T08_Karina': 'Karina'
        }
        self.statistical_results = {}
        self.load_data()

    def load_data(self):
        """
        Loads the data from the CSV file and prepares it for all analyses.
        This now includes creating a numerical subject code required for pygam.
        """
        print("--- Loading and Preparing Data ---")
        try:
            self.data = pd.read_csv(self.data_path)
            self.data['Subject'] = self.data['Subject'].astype(str)
            self.data['Sol'] = self.data['Sol'].astype(int)
            self.data['Crew_Name'] = self.data['Subject'].map(self.crew_names)
            self.data = self.data.sort_values(['Subject', 'Sol'])
            self.data['Mission_Phase'] = self.data['Sol'].apply(self._categorize_mission_phase)
            self.data['Subject_code'] = self.data['Subject'].astype('category').cat.codes

            print(f"✓ Data loaded successfully from {self.data_path}")
            print(f"✓ Data shape: {self.data.shape}")
            print(f"✓ Crew members found: {len(self.data['Subject'].unique())}")
            print(f"✓ Mission Day (Sol) range: {self.data['Sol'].min()} to {self.data['Sol'].max()}")

        except FileNotFoundError:
            print(f"✗ ERROR: Data file not found at the specified path: '{self.data_path}'")
            print("  Please ensure the path is correct.")
        except Exception as e:
            print(f"✗ An unexpected error occurred during data loading: {e}")

    def _categorize_mission_phase(self, sol):
        """Helper function to categorize mission days into distinct phases."""
        if sol <= 5:
            return 'Early Mission (Sols 1-5)'
        elif sol <= 10:
            return 'Mid Mission (Sols 6-10)'
        else:
            return 'Late Mission (Sols 11+)'

    def perform_statistical_analysis(self):
        """Performs ANOVA, Correlation, and Tukey's HSD tests."""
        print("\n\n--- Performing Comprehensive Statistical Analysis ---")
        results = {}
        print("\n[1] One-Way ANOVA: Testing for differences between crew members...")
        for metric in self.parasympathetic_metrics:
            if metric in self.data.columns:
                groups = [self.data[self.data['Subject'] == crew][metric].dropna() for crew in self.data['Subject'].unique()]
                groups = [g for g in groups if len(g) > 0]
                if len(groups) >= 2:
                    f_stat, p_value = f_oneway(*groups)
                    results[f'anova_{metric}'] = {'F-statistic': f_stat, 'p-value': p_value}
                    sig_marker = '***' if p_value < 0.001 else '**' if p_value < 0.01 else '*' if p_value < 0.05 else 'ns (not significant)'
                    print(f"  - {metric.upper():<8}: F={f_stat:.3f}, p={p_value:.4f} ({sig_marker})")

        print("\n[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...")
        for metric in self.parasympathetic_metrics:
            if metric in self.data.columns:
                clean_data = self.data[['Sol', metric]].dropna()
                if len(clean_data) > 3:
                    r_pearson, p_pearson = pearsonr(clean_data['Sol'], clean_data[metric])
                    r_spearman, p_spearman = spearmanr(clean_data['Sol'], clean_data[metric])
                    results[f'corr_{metric}'] = {'pearson_r': r_pearson, 'pearson_p': p_pearson, 'spearman_r': r_spearman, 'spearman_p': p_spearman}
                    print(f"  - {metric.upper():<8}: Spearman r={r_spearman:.3f} (p={p_spearman:.4f}), Pearson r={r_pearson:.3f} (p={p_pearson:.4f})")

        print("\n[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...")
        for metric in self.parasympathetic_metrics:
            if f'anova_{metric}' in results and results[f'anova_{metric}']['p-value'] < 0.05:
                print(f"\n  - Tukey HSD results for {metric.upper()} (since ANOVA was significant):")
                clean_data = self.data[[metric, 'Subject']].dropna()
                tukey_result = pairwise_tukeyhsd(endog=clean_data[metric], groups=clean_data['Subject'], alpha=0.05)
                results[f'tukey_{metric}'] = tukey_result
                print(tukey_result)

        self.statistical_results = results
        print("\n✓ Statistical analysis complete.")

    def run_generalized_additive_models(self):
        """
        Fits, summarizes, and plots a Generalized Additive Model for each metric.
        GAMs are powerful tools that can model complex, non-linear relationships.
        """
        print("\n\n--- Running Generalized Additive Models (GAMs) ---")
        print("This analysis captures non-linear trends over the mission duration.")

        for metric in self.parasympathetic_metrics:
            if metric not in self.data.columns:
                print(f"\n--- Skipping {metric.upper()}: Column not found ---")
                continue

            print(f"\n--- Analyzing Metric: {metric.upper()} ---")

            try:
                model_data = self.data[['Sol', 'Subject', 'Subject_code', metric]].dropna()
                if len(model_data['Subject_code'].unique()) < 2 or len(model_data) < 15:
                    print("Not enough data to robustly fit the model.")
                    continue

                X = model_data[['Sol', 'Subject_code']]
                y = model_data[metric]

                gam = GAM(s(0, n_splines=10) + f(1)).fit(X, y)
                print(gam.summary())

                fig, ax = plt.subplots(1, 1, figsize=(12, 8))
                XX = gam.generate_X_grid(term=0)
                pdep, confi = gam.partial_dependence(term=0, X=XX, width=0.95)

                ax.plot(XX[:, 0], pdep, color='royalblue', linewidth=3, label='GAM Trend')
                ax.fill_between(XX[:, 0], confi[:, 0], confi[:, 1], color='cornflowerblue', alpha=0.3, label='95% Confidence Interval')
                sns.scatterplot(x='Sol', y=metric, hue='Subject', data=model_data, ax=ax, alpha=0.7, palette='husl', s=50)

                ax.set_title(f"Non-Linear Trend for {metric.upper()} (GAM Analysis)", fontsize=16, fontweight='bold')
                ax.set_xlabel("Sol (Mission Day)")
                ax.set_ylabel(f"{metric.upper()} Value")
                ax.grid(True, which='both', linestyle='--', linewidth=0.5)
                ax.legend(title='Subject', bbox_to_anchor=(1.05, 1), loc='upper left')
                plt.tight_layout()
                plt.show()

            except Exception as e:
                print(f"Could not fit GAM for {metric}. An unexpected error occurred: {e}")

    def plot_longitudinal_trends(self):
        """Creates longitudinal trend plots for each metric (linear trend)."""
        print("\n--- Plotting Longitudinal Trends (Linear Fit) ---")
        fig, axes = plt.subplots(3, 2, figsize=(20, 22))
        axes = axes.flatten()
        for i, metric in enumerate(self.parasympathetic_metrics):
            ax = axes[i]
            for subject_id, crew_data in self.data.groupby('Subject'):
                ax.plot(crew_data['Sol'], crew_data[metric], marker='o', linestyle='-', alpha=0.8, label=self.crew_names[subject_id])
                if len(crew_data['Sol'].dropna()) > 1:
                    z = np.polyfit(crew_data['Sol'].dropna(), crew_data[metric].dropna(), 1)
                    p = np.poly1d(z)
                    ax.plot(crew_data['Sol'], p(crew_data['Sol']), linestyle='--', alpha=0.6, color=ax.get_lines()[-1].get_color())

            ax.set_title(f'Longitudinal Trend: {metric.upper()}', fontweight='bold')
            ax.set_xlabel('Sol (Mission Day)')
            ax.set_ylabel(f'{metric.upper()} Value')
            ax.grid(True, which='both', linestyle='--', linewidth=0.5)

        handles, labels = axes[0].get_legend_handles_labels()
        unique_handles = [handles[i] for i in range(0, len(handles), 2)]
        unique_labels = [labels[i] for i in range(0, len(labels), 2)]
        fig.legend(unique_handles, unique_labels, loc='upper right', bbox_to_anchor=(1.1, 0.95), title="Crew Member")
        plt.tight_layout(rect=[0, 0, 0.9, 1])
        plt.suptitle('Parasympathetic HRV Metrics Over Mission Time (Linear Fit)', fontsize=20, y=1.02)
        plt.show()

    def plot_mission_phase_analysis(self):
        """Analyzes and visualizes activity across mission phases."""
        print("\n--- Plotting Mission Phase Analysis ---")
        fig, axes = plt.subplots(3, 2, figsize=(20, 22))
        axes = axes.flatten()
        phase_order = ['Early Mission (Sols 1-5)', 'Mid Mission (Sols 6-10)', 'Late Mission (Sols 11+)']
        for i, metric in enumerate(self.parasympathetic_metrics):
            ax = axes[i]
            sns.boxplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, showfliers=False)
            sns.stripplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, color='black', alpha=0.5, size=4)
            ax.set_title(f'Analysis by Mission Phase: {metric.upper()}', fontweight='bold')
            ax.set_xlabel('Mission Phase')
            ax.set_ylabel(f'{metric.upper()} Value')
            ax.tick_params(axis='x', rotation=10)
        plt.tight_layout()
        plt.suptitle('Parasympathetic HRV Metrics Across Mission Phases', fontsize=20, y=1.02)
        plt.show()

    def generate_scientific_report(self):
        """Generates a formatted scientific report summarizing all findings."""
        print("\n\n" + "="*80)
        print("           COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT")
        print("="*80)
        report = [
            "\n### 1. INTRODUCTION ###",
            "This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.",
            "PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.",
            "The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.",
            "\n### 2. METHODOLOGY ###",
            "The following validated HRV metrics were used to assess parasympathetic activity:",
            "- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).",
            "- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.",
            "- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.",
            "\nStatistical analyses included:",
            "  1. One-Way ANOVA and Tukey's HSD for between-crew comparisons.",
            "  2. Spearman/Pearson correlation to assess linear trends over time.",
            "  3. Generalized Additive Models (GAMs) to identify non-linear trends over time.",
            "\n### 3. STATISTICAL RESULTS ###",
            "\n--- Between-Crew Differences (ANOVA) ---",
            "This test checks if there are significant differences in the average metric values among crew members over the entire mission.",
        ]
        for metric in self.parasympathetic_metrics:
            if f'anova_{metric}' in self.statistical_results:
                res = self.statistical_results[f'anova_{metric}']
                sig = 'p < 0.05, SIGNIFICANT' if res['p-value'] < 0.05 else 'p > 0.05, not significant'
                report.append(f"  - {metric.upper()}: F={res['F-statistic']:.2f}, p={res['p-value']:.3f} ({sig})")

        report.append("\n--- Mission Time Trend Analysis (Correlation) ---")
        report.append("This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.")
        for metric in self.parasympathetic_metrics:
            if f'corr_{metric}' in self.statistical_results:
                res = self.statistical_results[f'corr_{metric}']
                sig = 'p < 0.05, SIGNIFICANT' if res['spearman_p'] < 0.05 else 'p > 0.05, not significant'
                report.append(f"  - {metric.upper()}: Spearman r={res['spearman_r']:.3f} (p={res['spearman_p']:.3f}, {sig})")

        report.append("\n--- Non-Linear Trend Analysis (GAM) ---")
        report.append("GAMs were used to explore complex, non-linear patterns in HRV metrics over the mission duration.")
        report.append("The plots generated by this analysis visualize the average trend for the crew, revealing periods of increase, decrease, or stability that a simple linear model might miss.")
        report.append("Interpretation should be based on visual inspection of the GAM plots: a significant curve or 'wiggle' suggests a non-linear relationship with time.")

        report.append("\n--- Pairwise Crew Comparisons (Tukey's HSD) ---")
        report.append("For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.")
        for metric in self.parasympathetic_metrics:
            if f'tukey_{metric}' in self.statistical_results:
                report.append(f"\n  Significant differences for {metric.upper()}:")
                report.append(str(self.statistical_results[f'tukey_{metric}']))
        
        print("\n".join(report))
        print("\n" + "="*80)
        print("                         END OF REPORT")
        print("="*80)

# --- 4. Main Execution Block ---
# Create an instance of the analyzer and run all analysis methods.
analyzer = ParasympatheticAnalyzer()

if analyzer.data is not None:
    analyzer.perform_statistical_analysis()
    analyzer.plot_longitudinal_trends()
    analyzer.plot_mission_phase_analysis()
    analyzer.run_generalized_additive_models() # Run the new GAM analysis
    analyzer.generate_scientific_report()
else:
    print("\nAnalysis could not proceed because data failed to load.")
--- Loading and Preparing Data ---
✓ Data loaded successfully from C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv
✓ Data shape: (37, 36)
✓ Crew members found: 8
✓ Mission Day (Sol) range: 2 to 16


--- Performing Comprehensive Statistical Analysis ---

[1] One-Way ANOVA: Testing for differences between crew members...
  - RMSSD   : F=3.400, p=0.0090 (**)
  - PNNI_50 : F=2.485, p=0.0396 (*)
  - PNNI_20 : F=2.354, p=0.0493 (*)
  - HF      : F=2.320, p=0.0523 (ns (not significant))
  - HFNU    : F=3.547, p=0.0071 (**)
  - SD1     : F=3.399, p=0.0090 (**)

[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...
  - RMSSD   : Spearman r=-0.279 (p=0.0945), Pearson r=-0.118 (p=0.4856)
  - PNNI_50 : Spearman r=-0.308 (p=0.0640), Pearson r=-0.099 (p=0.5607)
  - PNNI_20 : Spearman r=-0.241 (p=0.1500), Pearson r=-0.135 (p=0.4271)
  - HF      : Spearman r=-0.214 (p=0.2033), Pearson r=-0.001 (p=0.9933)
  - HFNU    : Spearman r=-0.318 (p=0.0553), Pearson r=-0.279 (p=0.0946)
  - SD1     : Spearman r=-0.278 (p=0.0962), Pearson r=-0.118 (p=0.4855)

[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...

  - Tukey HSD results for RMSSD (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura    -2.56 0.9993 -17.9577 12.8377  False
       T01_Mara       T03_Nancy    0.285    1.0 -13.6428 14.2128  False
       T01_Mara    T04_Michelle  -3.1225 0.9952 -17.0503 10.8053  False
       T01_Mara   T05_Felicitas    13.15 0.0293   0.8669 25.4331   True
       T01_Mara T06_Mara_Selena   -1.345    1.0 -13.6281 10.9381  False
       T01_Mara   T07_Geraldinn   -2.105 0.9996 -16.0328 11.8228  False
       T01_Mara      T08_Karina   -5.495 0.9713 -23.4757 12.4857  False
      T02_Laura       T03_Nancy    2.845 0.9993  -14.526  20.216  False
      T02_Laura    T04_Michelle  -0.5625    1.0 -17.9335 16.8085  False
      T02_Laura   T05_Felicitas    15.71 0.0593  -0.3724 31.7924  False
      T02_Laura T06_Mara_Selena    1.215    1.0 -14.8674 17.2974  False
      T02_Laura   T07_Geraldinn    0.455    1.0  -16.916  17.826  False
      T02_Laura      T08_Karina   -2.935 0.9997 -23.6973 17.8273  False
      T03_Nancy    T04_Michelle  -3.4075 0.9966 -19.4899 12.6749  False
      T03_Nancy   T05_Felicitas   12.865 0.1193  -1.8161 27.5461  False
      T03_Nancy T06_Mara_Selena    -1.63    1.0 -16.3111 13.0511  False
      T03_Nancy   T07_Geraldinn    -2.39 0.9997 -18.4724 13.6924  False
      T03_Nancy      T08_Karina    -5.78  0.977 -25.4768 13.9168  False
   T04_Michelle   T05_Felicitas  16.2725 0.0218   1.5914 30.9536   True
   T04_Michelle T06_Mara_Selena   1.7775 0.9999 -12.9036 16.4586  False
   T04_Michelle   T07_Geraldinn   1.0175    1.0 -15.0649 17.0999  False
   T04_Michelle      T08_Karina  -2.3725 0.9999 -22.0693 17.3243  False
  T05_Felicitas T06_Mara_Selena  -14.495 0.0226 -27.6262 -1.3638   True
  T05_Felicitas   T07_Geraldinn  -15.255 0.0373 -29.9361 -0.5739   True
  T05_Felicitas      T08_Karina  -18.645 0.0485 -37.2153 -0.0747   True
T06_Mara_Selena   T07_Geraldinn    -0.76    1.0 -15.4411 13.9211  False
T06_Mara_Selena      T08_Karina    -4.15 0.9953 -22.7203 14.4203  False
  T07_Geraldinn      T08_Karina    -3.39 0.9991 -23.0868 16.3068  False
-----------------------------------------------------------------------

  - Tukey HSD results for PNNI_50 (since ANOVA was significant):
         Multiple Comparison of Means - Tukey HSD, FWER=0.05         
=====================================================================
     group1          group2     meandiff p-adj   lower  upper  reject
---------------------------------------------------------------------
       T01_Mara       T02_Laura  -0.1483    1.0  -4.063 3.7663  False
       T01_Mara       T03_Nancy   0.3775    1.0 -3.1634 3.9184  False
       T01_Mara    T04_Michelle  -0.3125    1.0 -3.8534 3.2284  False
       T01_Mara   T05_Felicitas    3.225 0.0391  0.1022 6.3478   True
       T01_Mara T06_Mara_Selena   0.2717    1.0 -2.8511 3.3945  False
       T01_Mara   T07_Geraldinn  -0.2375    1.0 -3.7784 3.3034  False
       T01_Mara      T08_Karina   -0.415    1.0 -4.9863 4.1563  False
      T02_Laura       T03_Nancy   0.5258 0.9999 -3.8905 4.9421  False
      T02_Laura    T04_Michelle  -0.1642    1.0 -4.5805 4.2521  False
      T02_Laura   T05_Felicitas   3.3733 0.1658 -0.7154  7.462  False
      T02_Laura T06_Mara_Selena     0.42    1.0 -3.6687 4.5087  False
      T02_Laura   T07_Geraldinn  -0.0892    1.0 -4.5055 4.3271  False
      T02_Laura      T08_Karina  -0.2667    1.0 -5.5451 5.0118  False
      T03_Nancy    T04_Michelle    -0.69 0.9992 -4.7787 3.3987  False
      T03_Nancy   T05_Felicitas   2.8475 0.2399 -0.8849 6.5799  False
      T03_Nancy T06_Mara_Selena  -0.1058    1.0 -3.8383 3.6266  False
      T03_Nancy   T07_Geraldinn   -0.615 0.9996 -4.7037 3.4737  False
      T03_Nancy      T08_Karina  -0.7925 0.9995 -5.8001 4.2151  False
   T04_Michelle   T05_Felicitas   3.5375  0.073 -0.1949 7.2699  False
   T04_Michelle T06_Mara_Selena   0.5842 0.9995 -3.1483 4.3166  False
   T04_Michelle   T07_Geraldinn    0.075    1.0 -4.0137 4.1637  False
   T04_Michelle      T08_Karina  -0.1025    1.0 -5.1101 4.9051  False
  T05_Felicitas T06_Mara_Selena  -2.9533 0.1128 -6.2917 0.3851  False
  T05_Felicitas   T07_Geraldinn  -3.4625 0.0841 -7.1949 0.2699  False
  T05_Felicitas      T08_Karina    -3.64 0.2291 -8.3612 1.0812  False
T06_Mara_Selena   T07_Geraldinn  -0.5092 0.9998 -4.2416 3.2233  False
T06_Mara_Selena      T08_Karina  -0.6867 0.9997 -5.4079 4.0345  False
  T07_Geraldinn      T08_Karina  -0.1775    1.0 -5.1851 4.8301  False
---------------------------------------------------------------------

  - Tukey HSD results for PNNI_20 (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura   0.6396    1.0  -10.921 12.2002  False
       T01_Mara       T03_Nancy   3.3688 0.9619  -7.0882 13.8257  False
       T01_Mara    T04_Michelle   0.2713    1.0 -10.1857 10.7282  False
       T01_Mara   T05_Felicitas   9.8429   0.03   0.6207 19.0651   True
       T01_Mara T06_Mara_Selena   1.5379 0.9993  -7.6843 10.7601  False
       T01_Mara   T07_Geraldinn   1.3563 0.9999  -9.1007 11.8132  False
       T01_Mara      T08_Karina  -1.6837 0.9999 -15.1836 11.8161  False
      T02_Laura       T03_Nancy   2.7292 0.9969  -10.313 15.7713  False
      T02_Laura    T04_Michelle  -0.3683    1.0 -13.4105 12.6738  False
      T02_Laura   T05_Felicitas   9.2033 0.2408  -2.8713  21.278  False
      T02_Laura T06_Mara_Selena   0.8983    1.0 -11.1763  12.973  False
      T02_Laura   T07_Geraldinn   0.7167    1.0 -12.3255 13.7588  False
      T02_Laura      T08_Karina  -2.3233 0.9996 -17.9117  13.265  False
      T03_Nancy    T04_Michelle  -3.0975 0.9893 -15.1722  8.9772  False
      T03_Nancy   T05_Felicitas   6.4742 0.5518  -4.5484 17.4968  False
      T03_Nancy T06_Mara_Selena  -1.8308 0.9993 -12.8534  9.1918  False
      T03_Nancy   T07_Geraldinn  -2.0125 0.9993 -14.0872 10.0622  False
      T03_Nancy      T08_Karina  -5.0525 0.9483 -19.8409  9.7359  False
   T04_Michelle   T05_Felicitas   9.5717 0.1257  -1.4509 20.5943  False
   T04_Michelle T06_Mara_Selena   1.2667 0.9999  -9.7559 12.2893  False
   T04_Michelle   T07_Geraldinn    1.085    1.0 -10.9897 13.1597  False
   T04_Michelle      T08_Karina   -1.955 0.9998 -16.7434 12.8334  False
  T05_Felicitas T06_Mara_Selena   -8.305 0.1487 -18.1639  1.5539  False
  T05_Felicitas   T07_Geraldinn  -8.4867 0.2305 -19.5093  2.5359  False
  T05_Felicitas      T08_Karina -11.5267  0.164 -25.4693   2.416  False
T06_Mara_Selena   T07_Geraldinn  -0.1817    1.0 -11.2043 10.8409  False
T06_Mara_Selena      T08_Karina  -3.2217 0.9943 -17.1643  10.721  False
  T07_Geraldinn      T08_Karina    -3.04 0.9972 -17.8284 11.7484  False
-----------------------------------------------------------------------

  - Tukey HSD results for HFNU (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -2.7446 0.8683  -9.3362   3.847  False
       T01_Mara       T03_Nancy  -4.5238 0.2456 -10.4861  1.4386  False
       T01_Mara    T04_Michelle  -6.6712   0.02 -12.6336 -0.7089   True
       T01_Mara   T05_Felicitas  -2.1363 0.8818  -7.3945   3.122  False
       T01_Mara T06_Mara_Selena  -6.3529 0.0097 -11.6112 -1.0947   True
       T01_Mara   T07_Geraldinn  -4.8713 0.1742 -10.8336  1.0911  False
       T01_Mara      T08_Karina  -5.7813 0.2563 -13.4786  1.9161  False
      T02_Laura       T03_Nancy  -1.7792 0.9929  -9.2155  5.6571  False
      T02_Laura    T04_Michelle  -3.9267 0.6735  -11.363  3.5096  False
      T02_Laura   T05_Felicitas   0.6083    1.0  -6.2764   7.493  False
      T02_Laura T06_Mara_Selena  -3.6083 0.6814  -10.493  3.2764  False
      T02_Laura   T07_Geraldinn  -2.1267 0.9801   -9.563  5.3096  False
      T02_Laura      T08_Karina  -3.0367 0.9483 -11.9248  5.8514  False
      T03_Nancy    T04_Michelle  -2.1475 0.9679  -9.0322  4.7372  False
      T03_Nancy   T05_Felicitas   2.3875 0.9131  -3.8973  8.6723  False
      T03_Nancy T06_Mara_Selena  -1.8292  0.978   -8.114  4.4557  False
      T03_Nancy   T07_Geraldinn  -0.3475    1.0  -7.2322  6.5372  False
      T03_Nancy      T08_Karina  -1.2575 0.9996  -9.6895  7.1745  False
   T04_Michelle   T05_Felicitas    4.535 0.3007  -1.7498 10.8198  False
   T04_Michelle T06_Mara_Selena   0.3183    1.0  -5.9665  6.6032  False
   T04_Michelle   T07_Geraldinn      1.8  0.988  -5.0847  8.6847  False
   T04_Michelle      T08_Karina     0.89    1.0   -7.542   9.322  False
  T05_Felicitas T06_Mara_Selena  -4.2167 0.2577   -9.838  1.4047  False
  T05_Felicitas   T07_Geraldinn   -2.735  0.841  -9.0198  3.5498  False
  T05_Felicitas      T08_Karina   -3.645 0.8036 -11.5947  4.3047  False
T06_Mara_Selena   T07_Geraldinn   1.4817 0.9935  -4.8032  7.7665  False
T06_Mara_Selena      T08_Karina   0.5717    1.0  -7.3781  8.5214  False
  T07_Geraldinn      T08_Karina    -0.91    1.0   -9.342   7.522  False
-----------------------------------------------------------------------

  - Tukey HSD results for SD1 (since ANOVA was significant):
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -1.8121 0.9993 -12.7001  9.0759  False
       T01_Mara       T03_Nancy   0.2012    1.0  -9.6473 10.0498  False
       T01_Mara    T04_Michelle  -2.2062 0.9952 -12.0548  7.6423  False
       T01_Mara   T05_Felicitas   9.2979 0.0293   0.6123 17.9835   True
       T01_Mara T06_Mara_Selena  -0.9521    1.0  -9.6377  7.7335  False
       T01_Mara   T07_Geraldinn  -1.4888 0.9996 -11.3373  8.3598  False
       T01_Mara      T08_Karina  -3.8837 0.9713 -16.5982  8.8307  False
      T02_Laura       T03_Nancy   2.0133 0.9993   -10.27 14.2966  False
      T02_Laura    T04_Michelle  -0.3942    1.0 -12.6775 11.8891  False
      T02_Laura   T05_Felicitas    11.11 0.0592  -0.2621 22.4821  False
      T02_Laura T06_Mara_Selena     0.86    1.0 -10.5121 12.2321  False
      T02_Laura   T07_Geraldinn   0.3233    1.0   -11.96 12.6066  False
      T02_Laura      T08_Karina  -2.0717 0.9998  -16.753 12.6097  False
      T03_Nancy    T04_Michelle  -2.4075 0.9966 -13.7796  8.9646  False
      T03_Nancy   T05_Felicitas   9.0967 0.1193  -1.2846 19.4779  False
      T03_Nancy T06_Mara_Selena  -1.1533    1.0 -11.5346  9.2279  False
      T03_Nancy   T07_Geraldinn    -1.69 0.9997 -13.0621  9.6821  False
      T03_Nancy      T08_Karina   -4.085 0.9771 -18.0129  9.8429  False
   T04_Michelle   T05_Felicitas  11.5042 0.0218   1.1229 21.8854   True
   T04_Michelle T06_Mara_Selena   1.2542 0.9999  -9.1271 11.6354  False
   T04_Michelle   T07_Geraldinn   0.7175    1.0 -10.6546 12.0896  False
   T04_Michelle      T08_Karina  -1.6775 0.9999 -15.6054 12.2504  False
  T05_Felicitas T06_Mara_Selena   -10.25 0.0226 -19.5353 -0.9647   True
  T05_Felicitas   T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054   True
  T05_Felicitas      T08_Karina -13.1817 0.0486 -26.3131 -0.0503   True
T06_Mara_Selena   T07_Geraldinn  -0.5367    1.0 -10.9179  9.8446  False
T06_Mara_Selena      T08_Karina  -2.9317 0.9953 -16.0631 10.1997  False
  T07_Geraldinn      T08_Karina   -2.395 0.9991 -16.3229 11.5329  False
-----------------------------------------------------------------------

✓ Statistical analysis complete.

--- Plotting Longitudinal Trends (Linear Fit) ---
No description has been provided for this image
--- Plotting Mission Phase Analysis ---
No description has been provided for this image

--- Running Generalized Additive Models (GAMs) ---
This analysis captures non-linear trends over the mission duration.

--- Analyzing Metric: RMSSD ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -178.2912
Number of Samples:                           37 AIC:                                              379.8952
                                                AICc:                                             392.0156
                                                GCV:                                               98.0213
                                                Scale:                                             49.0325
                                                Pseudo R-Squared:                                   0.4969
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          7.65e-01                 
f(1)                              [0.6]                8            5.3          8.65e-03     **          
intercept                                              1            0.0          2.03e-06     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: PNNI_50 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                   -80.8561
Number of Samples:                           37 AIC:                                               185.025
                                                AICc:                                             197.1454
                                                GCV:                                                6.3395
                                                Scale:                                              3.1712
                                                Pseudo R-Squared:                                   0.4271
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.94e-01                 
f(1)                              [0.6]                8            5.3          4.11e-02     *           
intercept                                              1            0.0          1.70e-01                 
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: PNNI_20 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -156.9732
Number of Samples:                           37 AIC:                                              337.2592
                                                AICc:                                             349.3795
                                                GCV:                                               54.7781
                                                Scale:                                             27.4013
                                                Pseudo R-Squared:                                   0.4209
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          9.49e-01                 
f(1)                              [0.6]                8            5.3          4.71e-02     *           
intercept                                              1            0.0          2.86e-03     **          
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: HF ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                   -352.976
Number of Samples:                           37 AIC:                                              729.2647
                                                AICc:                                             741.3851
                                                GCV:                                            11087.6823
                                                Scale:                                           5546.3138
                                                Pseudo R-Squared:                                   0.4016
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.62e-01                 
f(1)                              [0.6]                8            5.3          5.52e-02     .           
intercept                                              1            0.0          1.63e-01                 
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: HFNU ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -110.8589
Number of Samples:                           37 AIC:                                              245.0305
                                                AICc:                                             257.1509
                                                GCV:                                               15.2293
                                                Scale:                                               7.618
                                                Pseudo R-Squared:                                   0.5816
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.23e-02     .           
f(1)                              [0.6]                8            5.3          4.28e-03     **          
intercept                                              1            0.0          1.88e-09     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image
--- Analyzing Metric: SD1 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -152.9113
Number of Samples:                           37 AIC:                                              329.1353
                                                AICc:                                             341.2557
                                                GCV:                                               49.0078
                                                Scale:                                             24.5148
                                                Pseudo R-Squared:                                   0.4969
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          7.65e-01                 
f(1)                              [0.6]                8            5.3          8.65e-03     **          
intercept                                              1            0.0          2.03e-06     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
None
No description has been provided for this image

================================================================================
           COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT
================================================================================

### 1. INTRODUCTION ###
This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.
PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.
The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.

### 2. METHODOLOGY ###
The following validated HRV metrics were used to assess parasympathetic activity:
- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).
- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.
- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.

Statistical analyses included:
  1. One-Way ANOVA and Tukey's HSD for between-crew comparisons.
  2. Spearman/Pearson correlation to assess linear trends over time.
  3. Generalized Additive Models (GAMs) to identify non-linear trends over time.

### 3. STATISTICAL RESULTS ###

--- Between-Crew Differences (ANOVA) ---
This test checks if there are significant differences in the average metric values among crew members over the entire mission.
  - RMSSD: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)
  - PNNI_50: F=2.49, p=0.040 (p < 0.05, SIGNIFICANT)
  - PNNI_20: F=2.35, p=0.049 (p < 0.05, SIGNIFICANT)
  - HF: F=2.32, p=0.052 (p > 0.05, not significant)
  - HFNU: F=3.55, p=0.007 (p < 0.05, SIGNIFICANT)
  - SD1: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)

--- Mission Time Trend Analysis (Correlation) ---
This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.
  - RMSSD: Spearman r=-0.279 (p=0.094, p > 0.05, not significant)
  - PNNI_50: Spearman r=-0.308 (p=0.064, p > 0.05, not significant)
  - PNNI_20: Spearman r=-0.241 (p=0.150, p > 0.05, not significant)
  - HF: Spearman r=-0.214 (p=0.203, p > 0.05, not significant)
  - HFNU: Spearman r=-0.318 (p=0.055, p > 0.05, not significant)
  - SD1: Spearman r=-0.278 (p=0.096, p > 0.05, not significant)

--- Non-Linear Trend Analysis (GAM) ---
GAMs were used to explore complex, non-linear patterns in HRV metrics over the mission duration.
The plots generated by this analysis visualize the average trend for the crew, revealing periods of increase, decrease, or stability that a simple linear model might miss.
Interpretation should be based on visual inspection of the GAM plots: a significant curve or 'wiggle' suggests a non-linear relationship with time.

--- Pairwise Crew Comparisons (Tukey's HSD) ---
For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.

  Significant differences for RMSSD:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura    -2.56 0.9993 -17.9577 12.8377  False
       T01_Mara       T03_Nancy    0.285    1.0 -13.6428 14.2128  False
       T01_Mara    T04_Michelle  -3.1225 0.9952 -17.0503 10.8053  False
       T01_Mara   T05_Felicitas    13.15 0.0293   0.8669 25.4331   True
       T01_Mara T06_Mara_Selena   -1.345    1.0 -13.6281 10.9381  False
       T01_Mara   T07_Geraldinn   -2.105 0.9996 -16.0328 11.8228  False
       T01_Mara      T08_Karina   -5.495 0.9713 -23.4757 12.4857  False
      T02_Laura       T03_Nancy    2.845 0.9993  -14.526  20.216  False
      T02_Laura    T04_Michelle  -0.5625    1.0 -17.9335 16.8085  False
      T02_Laura   T05_Felicitas    15.71 0.0593  -0.3724 31.7924  False
      T02_Laura T06_Mara_Selena    1.215    1.0 -14.8674 17.2974  False
      T02_Laura   T07_Geraldinn    0.455    1.0  -16.916  17.826  False
      T02_Laura      T08_Karina   -2.935 0.9997 -23.6973 17.8273  False
      T03_Nancy    T04_Michelle  -3.4075 0.9966 -19.4899 12.6749  False
      T03_Nancy   T05_Felicitas   12.865 0.1193  -1.8161 27.5461  False
      T03_Nancy T06_Mara_Selena    -1.63    1.0 -16.3111 13.0511  False
      T03_Nancy   T07_Geraldinn    -2.39 0.9997 -18.4724 13.6924  False
      T03_Nancy      T08_Karina    -5.78  0.977 -25.4768 13.9168  False
   T04_Michelle   T05_Felicitas  16.2725 0.0218   1.5914 30.9536   True
   T04_Michelle T06_Mara_Selena   1.7775 0.9999 -12.9036 16.4586  False
   T04_Michelle   T07_Geraldinn   1.0175    1.0 -15.0649 17.0999  False
   T04_Michelle      T08_Karina  -2.3725 0.9999 -22.0693 17.3243  False
  T05_Felicitas T06_Mara_Selena  -14.495 0.0226 -27.6262 -1.3638   True
  T05_Felicitas   T07_Geraldinn  -15.255 0.0373 -29.9361 -0.5739   True
  T05_Felicitas      T08_Karina  -18.645 0.0485 -37.2153 -0.0747   True
T06_Mara_Selena   T07_Geraldinn    -0.76    1.0 -15.4411 13.9211  False
T06_Mara_Selena      T08_Karina    -4.15 0.9953 -22.7203 14.4203  False
  T07_Geraldinn      T08_Karina    -3.39 0.9991 -23.0868 16.3068  False
-----------------------------------------------------------------------

  Significant differences for PNNI_50:
         Multiple Comparison of Means - Tukey HSD, FWER=0.05         
=====================================================================
     group1          group2     meandiff p-adj   lower  upper  reject
---------------------------------------------------------------------
       T01_Mara       T02_Laura  -0.1483    1.0  -4.063 3.7663  False
       T01_Mara       T03_Nancy   0.3775    1.0 -3.1634 3.9184  False
       T01_Mara    T04_Michelle  -0.3125    1.0 -3.8534 3.2284  False
       T01_Mara   T05_Felicitas    3.225 0.0391  0.1022 6.3478   True
       T01_Mara T06_Mara_Selena   0.2717    1.0 -2.8511 3.3945  False
       T01_Mara   T07_Geraldinn  -0.2375    1.0 -3.7784 3.3034  False
       T01_Mara      T08_Karina   -0.415    1.0 -4.9863 4.1563  False
      T02_Laura       T03_Nancy   0.5258 0.9999 -3.8905 4.9421  False
      T02_Laura    T04_Michelle  -0.1642    1.0 -4.5805 4.2521  False
      T02_Laura   T05_Felicitas   3.3733 0.1658 -0.7154  7.462  False
      T02_Laura T06_Mara_Selena     0.42    1.0 -3.6687 4.5087  False
      T02_Laura   T07_Geraldinn  -0.0892    1.0 -4.5055 4.3271  False
      T02_Laura      T08_Karina  -0.2667    1.0 -5.5451 5.0118  False
      T03_Nancy    T04_Michelle    -0.69 0.9992 -4.7787 3.3987  False
      T03_Nancy   T05_Felicitas   2.8475 0.2399 -0.8849 6.5799  False
      T03_Nancy T06_Mara_Selena  -0.1058    1.0 -3.8383 3.6266  False
      T03_Nancy   T07_Geraldinn   -0.615 0.9996 -4.7037 3.4737  False
      T03_Nancy      T08_Karina  -0.7925 0.9995 -5.8001 4.2151  False
   T04_Michelle   T05_Felicitas   3.5375  0.073 -0.1949 7.2699  False
   T04_Michelle T06_Mara_Selena   0.5842 0.9995 -3.1483 4.3166  False
   T04_Michelle   T07_Geraldinn    0.075    1.0 -4.0137 4.1637  False
   T04_Michelle      T08_Karina  -0.1025    1.0 -5.1101 4.9051  False
  T05_Felicitas T06_Mara_Selena  -2.9533 0.1128 -6.2917 0.3851  False
  T05_Felicitas   T07_Geraldinn  -3.4625 0.0841 -7.1949 0.2699  False
  T05_Felicitas      T08_Karina    -3.64 0.2291 -8.3612 1.0812  False
T06_Mara_Selena   T07_Geraldinn  -0.5092 0.9998 -4.2416 3.2233  False
T06_Mara_Selena      T08_Karina  -0.6867 0.9997 -5.4079 4.0345  False
  T07_Geraldinn      T08_Karina  -0.1775    1.0 -5.1851 4.8301  False
---------------------------------------------------------------------

  Significant differences for PNNI_20:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura   0.6396    1.0  -10.921 12.2002  False
       T01_Mara       T03_Nancy   3.3688 0.9619  -7.0882 13.8257  False
       T01_Mara    T04_Michelle   0.2713    1.0 -10.1857 10.7282  False
       T01_Mara   T05_Felicitas   9.8429   0.03   0.6207 19.0651   True
       T01_Mara T06_Mara_Selena   1.5379 0.9993  -7.6843 10.7601  False
       T01_Mara   T07_Geraldinn   1.3563 0.9999  -9.1007 11.8132  False
       T01_Mara      T08_Karina  -1.6837 0.9999 -15.1836 11.8161  False
      T02_Laura       T03_Nancy   2.7292 0.9969  -10.313 15.7713  False
      T02_Laura    T04_Michelle  -0.3683    1.0 -13.4105 12.6738  False
      T02_Laura   T05_Felicitas   9.2033 0.2408  -2.8713  21.278  False
      T02_Laura T06_Mara_Selena   0.8983    1.0 -11.1763  12.973  False
      T02_Laura   T07_Geraldinn   0.7167    1.0 -12.3255 13.7588  False
      T02_Laura      T08_Karina  -2.3233 0.9996 -17.9117  13.265  False
      T03_Nancy    T04_Michelle  -3.0975 0.9893 -15.1722  8.9772  False
      T03_Nancy   T05_Felicitas   6.4742 0.5518  -4.5484 17.4968  False
      T03_Nancy T06_Mara_Selena  -1.8308 0.9993 -12.8534  9.1918  False
      T03_Nancy   T07_Geraldinn  -2.0125 0.9993 -14.0872 10.0622  False
      T03_Nancy      T08_Karina  -5.0525 0.9483 -19.8409  9.7359  False
   T04_Michelle   T05_Felicitas   9.5717 0.1257  -1.4509 20.5943  False
   T04_Michelle T06_Mara_Selena   1.2667 0.9999  -9.7559 12.2893  False
   T04_Michelle   T07_Geraldinn    1.085    1.0 -10.9897 13.1597  False
   T04_Michelle      T08_Karina   -1.955 0.9998 -16.7434 12.8334  False
  T05_Felicitas T06_Mara_Selena   -8.305 0.1487 -18.1639  1.5539  False
  T05_Felicitas   T07_Geraldinn  -8.4867 0.2305 -19.5093  2.5359  False
  T05_Felicitas      T08_Karina -11.5267  0.164 -25.4693   2.416  False
T06_Mara_Selena   T07_Geraldinn  -0.1817    1.0 -11.2043 10.8409  False
T06_Mara_Selena      T08_Karina  -3.2217 0.9943 -17.1643  10.721  False
  T07_Geraldinn      T08_Karina    -3.04 0.9972 -17.8284 11.7484  False
-----------------------------------------------------------------------

  Significant differences for HFNU:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -2.7446 0.8683  -9.3362   3.847  False
       T01_Mara       T03_Nancy  -4.5238 0.2456 -10.4861  1.4386  False
       T01_Mara    T04_Michelle  -6.6712   0.02 -12.6336 -0.7089   True
       T01_Mara   T05_Felicitas  -2.1363 0.8818  -7.3945   3.122  False
       T01_Mara T06_Mara_Selena  -6.3529 0.0097 -11.6112 -1.0947   True
       T01_Mara   T07_Geraldinn  -4.8713 0.1742 -10.8336  1.0911  False
       T01_Mara      T08_Karina  -5.7813 0.2563 -13.4786  1.9161  False
      T02_Laura       T03_Nancy  -1.7792 0.9929  -9.2155  5.6571  False
      T02_Laura    T04_Michelle  -3.9267 0.6735  -11.363  3.5096  False
      T02_Laura   T05_Felicitas   0.6083    1.0  -6.2764   7.493  False
      T02_Laura T06_Mara_Selena  -3.6083 0.6814  -10.493  3.2764  False
      T02_Laura   T07_Geraldinn  -2.1267 0.9801   -9.563  5.3096  False
      T02_Laura      T08_Karina  -3.0367 0.9483 -11.9248  5.8514  False
      T03_Nancy    T04_Michelle  -2.1475 0.9679  -9.0322  4.7372  False
      T03_Nancy   T05_Felicitas   2.3875 0.9131  -3.8973  8.6723  False
      T03_Nancy T06_Mara_Selena  -1.8292  0.978   -8.114  4.4557  False
      T03_Nancy   T07_Geraldinn  -0.3475    1.0  -7.2322  6.5372  False
      T03_Nancy      T08_Karina  -1.2575 0.9996  -9.6895  7.1745  False
   T04_Michelle   T05_Felicitas    4.535 0.3007  -1.7498 10.8198  False
   T04_Michelle T06_Mara_Selena   0.3183    1.0  -5.9665  6.6032  False
   T04_Michelle   T07_Geraldinn      1.8  0.988  -5.0847  8.6847  False
   T04_Michelle      T08_Karina     0.89    1.0   -7.542   9.322  False
  T05_Felicitas T06_Mara_Selena  -4.2167 0.2577   -9.838  1.4047  False
  T05_Felicitas   T07_Geraldinn   -2.735  0.841  -9.0198  3.5498  False
  T05_Felicitas      T08_Karina   -3.645 0.8036 -11.5947  4.3047  False
T06_Mara_Selena   T07_Geraldinn   1.4817 0.9935  -4.8032  7.7665  False
T06_Mara_Selena      T08_Karina   0.5717    1.0  -7.3781  8.5214  False
  T07_Geraldinn      T08_Karina    -0.91    1.0   -9.342   7.522  False
-----------------------------------------------------------------------

  Significant differences for SD1:
          Multiple Comparison of Means - Tukey HSD, FWER=0.05          
=======================================================================
     group1          group2     meandiff p-adj   lower    upper  reject
-----------------------------------------------------------------------
       T01_Mara       T02_Laura  -1.8121 0.9993 -12.7001  9.0759  False
       T01_Mara       T03_Nancy   0.2012    1.0  -9.6473 10.0498  False
       T01_Mara    T04_Michelle  -2.2062 0.9952 -12.0548  7.6423  False
       T01_Mara   T05_Felicitas   9.2979 0.0293   0.6123 17.9835   True
       T01_Mara T06_Mara_Selena  -0.9521    1.0  -9.6377  7.7335  False
       T01_Mara   T07_Geraldinn  -1.4888 0.9996 -11.3373  8.3598  False
       T01_Mara      T08_Karina  -3.8837 0.9713 -16.5982  8.8307  False
      T02_Laura       T03_Nancy   2.0133 0.9993   -10.27 14.2966  False
      T02_Laura    T04_Michelle  -0.3942    1.0 -12.6775 11.8891  False
      T02_Laura   T05_Felicitas    11.11 0.0592  -0.2621 22.4821  False
      T02_Laura T06_Mara_Selena     0.86    1.0 -10.5121 12.2321  False
      T02_Laura   T07_Geraldinn   0.3233    1.0   -11.96 12.6066  False
      T02_Laura      T08_Karina  -2.0717 0.9998  -16.753 12.6097  False
      T03_Nancy    T04_Michelle  -2.4075 0.9966 -13.7796  8.9646  False
      T03_Nancy   T05_Felicitas   9.0967 0.1193  -1.2846 19.4779  False
      T03_Nancy T06_Mara_Selena  -1.1533    1.0 -11.5346  9.2279  False
      T03_Nancy   T07_Geraldinn    -1.69 0.9997 -13.0621  9.6821  False
      T03_Nancy      T08_Karina   -4.085 0.9771 -18.0129  9.8429  False
   T04_Michelle   T05_Felicitas  11.5042 0.0218   1.1229 21.8854   True
   T04_Michelle T06_Mara_Selena   1.2542 0.9999  -9.1271 11.6354  False
   T04_Michelle   T07_Geraldinn   0.7175    1.0 -10.6546 12.0896  False
   T04_Michelle      T08_Karina  -1.6775 0.9999 -15.6054 12.2504  False
  T05_Felicitas T06_Mara_Selena   -10.25 0.0226 -19.5353 -0.9647   True
  T05_Felicitas   T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054   True
  T05_Felicitas      T08_Karina -13.1817 0.0486 -26.3131 -0.0503   True
T06_Mara_Selena   T07_Geraldinn  -0.5367    1.0 -10.9179  9.8446  False
T06_Mara_Selena      T08_Karina  -2.9317 0.9953 -16.0631 10.1997  False
  T07_Geraldinn      T08_Karina   -2.395 0.9991 -16.3229 11.5329  False
-----------------------------------------------------------------------

================================================================================
                         END OF REPORT
================================================================================
In [24]:
# =================================================================================
# Cell: Next-Step Recommendations - Analysis and Interpretation
# =================================================================================
# This notebook cell provides a complete workflow to address the key next-step
# recommendations from the initial analysis.
#
# It is structured to directly investigate:
# 1. Recommendation 1: Visualize Trajectories
#    - Implemented by running Generalized Additive Models (GAMs) to plot
#      non-linear individual and group trends over time.
#
# 2. Recommendation 2: Model Multivariate Links
#    - Implemented by first plotting a correlation heatmap to identify candidate
#      links, followed by a demonstrative multivariate mixed-effects model.
#
# 3. Recommendation 3: Validate Externally
#    - Addressed in a detailed markdown section at the end, explaining the
#      concept and identifying the key finding to validate in future studies.
# =================================================================================


# --- 1. Imports and Setup ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pygam import GAM, s, f
import statsmodels.formula.api as smf
import warnings

warnings.filterwarnings('ignore')

# --- 2. Plotting Style Configuration ---
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl", 8)
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 11
plt.rcParams['figure.dpi'] = 100


# --- 3. The AdvancedAnalysis Class ---
class AdvancedAnalysis:
    """
    A class to encapsulate the advanced analysis workflows based on the
    next-step recommendations.
    """
    def __init__(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
        self.data_path = data_path
        self.data = None
        self.parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
        self.crew_names = {
            'T01_Mara': 'Mara', 'T02_Laura': 'Laura', 'T03_Nancy': 'Nancy',
            'T04_Michelle': 'Michelle', 'T05_Felicitas': 'Felicitas',
            'T06_Mara_Selena': 'Mara Selena', 'T07_Geraldinn': 'Geraldinn', 'T08_Karina': 'Karina'
        }
        self.load_data()

    def load_data(self):
        """Loads and prepares the data."""
        print("--- Loading and Preparing Data ---")
        try:
            self.data = pd.read_csv(self.data_path)
            self.data['Subject'] = self.data['Subject'].astype(str)
            self.data['Sol'] = self.data['Sol'].astype(int)
            self.data['Crew_Name'] = self.data['Subject'].map(self.crew_names)
            self.data['Subject_code'] = self.data['Subject'].astype('category').cat.codes
            print(f"✓ Data loaded successfully from {self.data_path}")
        except Exception as e:
            print(f"✗ An error occurred during data loading: {e}")

    def visualize_trajectories(self):
        """
        Recommendation 1: Visualize Trajectories using GAMs.
        This method plots non-linear trends for each metric, which clearly
        shows individual trajectories and the overall group trend.
        """
        print("\n\n" + "="*80)
        print("### Recommendation 1: Visualize Trajectories ###")
        print("Using Generalized Additive Models (GAMs) to visualize non-linear trends.")
        print("="*80)

        for metric in self.parasympathetic_metrics:
            if metric not in self.data.columns: continue
            print(f"\n--- GAM Analysis for: {metric.upper()} ---")
            try:
                model_data = self.data[['Sol', 'Subject_code', 'Crew_Name', metric]].dropna()
                if len(model_data) < 15:
                    print("Not enough data to fit model.")
                    continue
                
                gam = GAM(s(0, n_splines=10) + f(1)).fit(model_data[['Sol', 'Subject_code']], model_data[metric])
                gam.summary()

                fig, ax = plt.subplots(1, 1, figsize=(12, 8))
                XX = gam.generate_X_grid(term=0)
                pdep, confi = gam.partial_dependence(term=0, X=XX, width=0.95)

                ax.plot(XX[:, 0], pdep, color='royalblue', linewidth=3, label='GAM Group Trend')
                ax.fill_between(XX[:, 0], confi[:, 0], confi[:, 1], color='cornflowerblue', alpha=0.3, label='95% Confidence Interval')
                sns.scatterplot(x='Sol', y=metric, hue='Crew_Name', data=model_data, ax=ax, alpha=0.7, s=60)

                ax.set_title(f"Individual Trajectories and Group Trend for {metric.upper()}", fontsize=16, fontweight='bold')
                ax.set_xlabel("Sol (Mission Day)")
                ax.set_ylabel(f"{metric.upper()} Value")
                ax.legend(title='Subject', bbox_to_anchor=(1.05, 1), loc='upper left')
                plt.tight_layout()
                plt.show()
            except Exception as e:
                print(f"Could not fit GAM for {metric}: {e}")

    def model_multivariate_links(self):
        """
        Recommendation 2: Model Multivariate Links.
        Step 1: Use a heatmap to find strongly correlated metrics.
        Step 2: Run a demonstrative multivariate mixed-effects model.
        """
        print("\n\n" + "="*80)
        print("### Recommendation 2: Model Multivariate Links ###")
        print("="*80)
        
        # --- Step 2a: Identify potential links with a correlation heatmap ---
        print("\n--- Step 2a: Identifying links with a Correlation Heatmap ---")
        corr_matrix = self.data[self.parasympathetic_metrics].corr(method='spearman')
        plt.figure(figsize=(10, 8))
        sns.heatmap(corr_matrix, annot=True, cmap='viridis', fmt='.2f', linewidths=.5)
        plt.title('Spearman Correlation Heatmap of Parasympathetic Metrics', fontsize=16, fontweight='bold')
        plt.show()
        
        # --- Step 2b: Demonstrate a multivariate mixed-effects model ---
        print("\n--- Step 2b: Demonstrating a Multivariate Mixed-Effects Model ---")
        print("Based on the heatmap, RMSSD and SD1 are almost perfectly correlated (r > 0.99).")
        print("Let's build a model to see how SD1 predicts RMSSD, while controlling for time (Sol) and individual subjects.")
        
        try:
            # Formula: Predict RMSSD based on Sol and SD1, with a random intercept for each Subject
            model_formula = "rmssd ~ Sol + sd1"
            model = smf.mixedlm(model_formula, self.data, groups=self.data["Subject"])
            result = model.fit()
            
            print("\n--- Multivariate Model Summary (RMSSD ~ Sol + SD1) ---")
            print(result.summary())
            
            print("\n--- Interpretation ---")
            sd1_coef = result.params['sd1']
            sd1_p_value = result.pvalues['sd1']
            print(f"The coefficient for sd1 is {sd1_coef:.4f} with a p-value of {sd1_p_value:.4E}.")
            if sd1_p_value < 0.05:
                print("This is highly significant, confirming that SD1 is a very strong predictor of RMSSD, even when accounting for mission day and individual differences.")
                print("This result is expected, as they are mathematically related metrics, but it successfully demonstrates the multivariate modeling approach.")
            else:
                print("The relationship between SD1 and RMSSD was not significant in this model, which would be an unexpected finding.")

        except Exception as e:
            print(f"Could not fit multivariate model: {e}")

    def explain_external_validation(self):
        """
        Recommendation 3: Explain External Validation.
        This method prints a detailed explanation of this conceptual step.
        """
        print("\n\n" + "="*80)
        print("### Recommendation 3: External Validation (A Conceptual Framework) ###")
        print("="*80)
        print("""
This recommendation is conceptual and does not involve running new code on the current dataset. Instead, it outlines a critical next step for ensuring the research is robust and generalizable.

**What is External Validation?**
External validation is the process of testing your findings on a completely new, independent dataset. In this context, it would mean repeating the key parts of this analysis on data collected from a different analog mission with a different set of crew members.

**Why is it Important?**
It confirms that your findings are not just a coincidence or an idiosyncrasy of the specific 8 individuals in this study. If the same patterns emerge in a new cohort, it provides strong evidence that you have uncovered a genuine physiological phenomenon related to the analog mission environment.

**What is the KEY FINDING to Validate from This Study?**
Based on the ANOVA, Tukey HSD, and GAM results, the most powerful and statistically significant finding from your current dataset is the **high degree of inter-individual variability**.

- The ANOVA tests were significant for most metrics (RMSSD, pNN50, pNN20, hfnu, SD1).
- The Tukey HSD tests pinpointed which specific individuals were different from others (e.g., T05_Felicitas having significantly higher RMSSD than 5 other crew members).
- The GAM analysis also showed that the 'Subject' factor was highly significant.

**How to Validate This Finding:**
1.  Collect HRV data from a new cohort of analog astronauts.
2.  Run the same One-Way ANOVA analysis on the new data.
3.  **Validation Check:** If the ANOVA results are again significant (p < 0.05), you have externally validated the finding that individual differences are a major driver of parasympathetic HRV in this environment. This strengthens the conclusion that "one size does not fit all" and that personalized physiological monitoring is crucial.
""")

# --- 5. Main Execution Block ---
# Create an instance of the analyzer and run all the recommended analyses.

analyzer = AdvancedAnalysis()

if analyzer.data is not None:
    # Run and display the analysis for each recommendation
    analyzer.visualize_trajectories()
    analyzer.model_multivariate_links()
    analyzer.explain_external_validation()
else:
    print("\nAnalysis could not proceed because data failed to load.")
--- Loading and Preparing Data ---
✓ Data loaded successfully from C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv


================================================================================
### Recommendation 1: Visualize Trajectories ###
Using Generalized Additive Models (GAMs) to visualize non-linear trends.
================================================================================

--- GAM Analysis for: RMSSD ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -178.2912
Number of Samples:                           37 AIC:                                              379.8952
                                                AICc:                                             392.0156
                                                GCV:                                               98.0213
                                                Scale:                                             49.0325
                                                Pseudo R-Squared:                                   0.4969
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          7.65e-01                 
f(1)                              [0.6]                8            5.3          8.65e-03     **          
intercept                                              1            0.0          2.03e-06     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
No description has been provided for this image
--- GAM Analysis for: PNNI_50 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                   -80.8561
Number of Samples:                           37 AIC:                                               185.025
                                                AICc:                                             197.1454
                                                GCV:                                                6.3395
                                                Scale:                                              3.1712
                                                Pseudo R-Squared:                                   0.4271
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.94e-01                 
f(1)                              [0.6]                8            5.3          4.11e-02     *           
intercept                                              1            0.0          1.70e-01                 
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
No description has been provided for this image
--- GAM Analysis for: PNNI_20 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -156.9732
Number of Samples:                           37 AIC:                                              337.2592
                                                AICc:                                             349.3795
                                                GCV:                                               54.7781
                                                Scale:                                             27.4013
                                                Pseudo R-Squared:                                   0.4209
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          9.49e-01                 
f(1)                              [0.6]                8            5.3          4.71e-02     *           
intercept                                              1            0.0          2.86e-03     **          
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
No description has been provided for this image
--- GAM Analysis for: HF ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                   -352.976
Number of Samples:                           37 AIC:                                              729.2647
                                                AICc:                                             741.3851
                                                GCV:                                            11087.6823
                                                Scale:                                           5546.3138
                                                Pseudo R-Squared:                                   0.4016
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.62e-01                 
f(1)                              [0.6]                8            5.3          5.52e-02     .           
intercept                                              1            0.0          1.63e-01                 
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
No description has been provided for this image
--- GAM Analysis for: HFNU ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -110.8589
Number of Samples:                           37 AIC:                                              245.0305
                                                AICc:                                             257.1509
                                                GCV:                                               15.2293
                                                Scale:                                               7.618
                                                Pseudo R-Squared:                                   0.5816
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          8.23e-02     .           
f(1)                              [0.6]                8            5.3          4.28e-03     **          
intercept                                              1            0.0          1.88e-09     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
No description has been provided for this image
--- GAM Analysis for: SD1 ---
GAM                                                                                                       
=============================================== ==========================================================
Distribution:                        NormalDist Effective DoF:                                     10.6564
Link Function:                     IdentityLink Log Likelihood:                                  -152.9113
Number of Samples:                           37 AIC:                                              329.1353
                                                AICc:                                             341.2557
                                                GCV:                                               49.0078
                                                Scale:                                             24.5148
                                                Pseudo R-Squared:                                   0.4969
==========================================================================================================
Feature Function                  Lambda               Rank         EDoF         P > x        Sig. Code   
================================= ==================== ============ ============ ============ ============
s(0)                              [0.6]                10           5.4          7.65e-01                 
f(1)                              [0.6]                8            5.3          8.65e-03     **          
intercept                                              1            0.0          2.03e-06     ***         
==========================================================================================================
Significance codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
         which can cause p-values to appear significant when they are not.

WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
         known smoothing parameters, but when smoothing parameters have been estimated, the p-values
         are typically lower than they should be, meaning that the tests reject the null too readily.
No description has been provided for this image

================================================================================
### Recommendation 2: Model Multivariate Links ###
================================================================================

--- Step 2a: Identifying links with a Correlation Heatmap ---
No description has been provided for this image
--- Step 2b: Demonstrating a Multivariate Mixed-Effects Model ---
Based on the heatmap, RMSSD and SD1 are almost perfectly correlated (r > 0.99).
Let's build a model to see how SD1 predicts RMSSD, while controlling for time (Sol) and individual subjects.

--- Multivariate Model Summary (RMSSD ~ Sol + SD1) ---
        Mixed Linear Model Regression Results
======================================================
Model:            MixedLM Dependent Variable: rmssd   
No. Observations: 37      Method:             REML    
No. Groups:       8       Scale:              0.0000  
Min. group size:  2       Log-Likelihood:     121.4315
Max. group size:  8       Converged:          Yes     
Mean group size:  4.6                                 
------------------------------------------------------
          Coef.  Std.Err.    z     P>|z| [0.025 0.975]
------------------------------------------------------
Intercept -0.001    0.002   -0.459 0.646 -0.006  0.004
Sol        0.000    0.000    0.185 0.853 -0.000  0.000
sd1        1.414    0.000 9573.291 0.000  1.414  1.415
Group Var  0.000                                      
======================================================


--- Interpretation ---
The coefficient for sd1 is 1.4142 with a p-value of 0.0000E+00.
This is highly significant, confirming that SD1 is a very strong predictor of RMSSD, even when accounting for mission day and individual differences.
This result is expected, as they are mathematically related metrics, but it successfully demonstrates the multivariate modeling approach.


================================================================================
### Recommendation 3: External Validation (A Conceptual Framework) ###
================================================================================

This recommendation is conceptual and does not involve running new code on the current dataset. Instead, it outlines a critical next step for ensuring the research is robust and generalizable.

**What is External Validation?**
External validation is the process of testing your findings on a completely new, independent dataset. In this context, it would mean repeating the key parts of this analysis on data collected from a different analog mission with a different set of crew members.

**Why is it Important?**
It confirms that your findings are not just a coincidence or an idiosyncrasy of the specific 8 individuals in this study. If the same patterns emerge in a new cohort, it provides strong evidence that you have uncovered a genuine physiological phenomenon related to the analog mission environment.

**What is the KEY FINDING to Validate from This Study?**
Based on the ANOVA, Tukey HSD, and GAM results, the most powerful and statistically significant finding from your current dataset is the **high degree of inter-individual variability**.

- The ANOVA tests were significant for most metrics (RMSSD, pNN50, pNN20, hfnu, SD1).
- The Tukey HSD tests pinpointed which specific individuals were different from others (e.g., T05_Felicitas having significantly higher RMSSD than 5 other crew members).
- The GAM analysis also showed that the 'Subject' factor was highly significant.

**How to Validate This Finding:**
1.  Collect HRV data from a new cohort of analog astronauts.
2.  Run the same One-Way ANOVA analysis on the new data.
3.  **Validation Check:** If the ANOVA results are again significant (p < 0.05), you have externally validated the finding that individual differences are a major driver of parasympathetic HRV in this environment. This strengthens the conclusion that "one size does not fit all" and that personalized physiological monitoring is crucial.